Bug 1085338
| Summary: | [vdsm] disk resize fails with QImgError if the lv isn't extended fast enough. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Elad <ebenahar> | ||||
| Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Elad <ebenahar> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.4.0 | CC: | adahms, amureini, bazulay, ebenahar, gklein, iheim, kwolf, lpeer, scohen, tnisan, yeylon | ||||
| Target Milestone: | --- | Flags: | amureini:
Triaged+
|
||||
| Target Release: | 3.4.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | vdsm-4.14.7-0.1.beta3.el6ev | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-06-09 13:30:14 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Mixed pools were only introduced in 3.4, so this can't be a regression. Allon, the regression is for the disk resize Elad, does this reproduce in a non-shared DC too? (In reply to Allon Mureinik from comment #4) > Elad, does this reproduce in a non-shared DC too? Works fine in local DC for both preallocated and for thin-provision disks The error is a qemu-img error (errno 28 = ENOSPC):
Thread-260379::DEBUG::2014-04-08 14:47:59,359::utils::556::root::(execCmd) '/usr/bin/qemu-img info -f qcow2 /rhev/data-center/mnt/blockSD/ae6330a3-bfae-4aef-ab4b-71074a336285/images/a32c5423-92cc-4978-a015-b8d782bd74ba/e43aee21-e097-4b3e-b164-c526035ddbc4' (cwd None)
Thread-260379::DEBUG::2014-04-08 14:47:59,557::utils::556::root::(execCmd) '/usr/bin/qemu-img resize -f qcow2 /rhev/data-center/mnt/blockSD/ae6330a3-bfae-4aef-ab4b-71074a336285/images/a32c5423-92cc-4978-a015-b8d782bd74ba/e43aee21-e097-4b3e-b164-c526035ddbc4 16106127360' (cwd None)
Thread-260379::DEBUG::2014-04-08 14:47:59,678::utils::576::root::(execCmd) FAILED: <err> = 'Error resizing image (28)\n'; <rc> = 1
Thread-260379::ERROR::2014-04-08 14:48:05,824::dispatcher::70::Storage.Dispatcher.Protect::(run) ecode=1, stdout=[], stderr=['Error resizing image (28)'], message=None
Traceback (most recent call last):
File "/usr/share/vdsm/storage/dispatcher.py", line 62, in run
result = ctask.prepare(self.func, *args, **kwargs)
File "/usr/share/vdsm/storage/task.py", line 1159, in prepare
raise self.error
QImgError: ecode=1, stdout=[], stderr=['Error resizing image (28)'], message=None
As far as I can tell the free space wasn't over in the storage domain as the stats were:
{'info': {'name': 'dc1', 'isoprefix': '/rhev/data-center/mnt/wolf.qa.lab.tlv.redhat.com:_volumes_wolf_shared_shared__iso__domain1/7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6/images/11111111-1111-1111-1111-111111111111', 'pool_status': 'connected', 'lver': 0, 'domains': '7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6:Active,1c841bd5-e0eb-4dff-9c15-1271dab0ac9a:Active,ae6330a3-bfae-4aef-ab4b-71074a336285:Active,7efff64e-72bf-4369-9adf-8633fdef0a32:Active,abdeeae9-be09-40d5-ab28-48cc3e1b98ae:Active,e2a17289-66a0-4c3a-b5db-93518b3ff648:Attached', 'master_uuid': '7efff64e-72bf-4369-9adf-8633fdef0a32', 'version': '3', 'spm_id': 1, 'type': 'NFS', 'master_ver': 7}, 'dominfo': {'7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6': {'status': 'Active', 'diskfree': '7193745162240', 'isoprefix': '/rhev/data-center/mnt/wolf.qa.lab.tlv.redhat.com:_volumes_wolf_shared_shared__iso__domain1/7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6/images/11111111-1111-1111-1111-111111111111', 'alerts': [], 'disktotal': '7261350002688', 'version': 0}, '1c841bd5-e0eb-4dff-9c15-1271dab0ac9a': {'status': 'Active', 'diskfree': '42278584320', 'isoprefix': '', 'alerts': [], 'disktotal': '53284438016', 'version': 3}, 'ae6330a3-bfae-4aef-ab4b-71074a336285': {'status': 'Active', 'diskfree': '19327352832', 'isoprefix': '', 'alerts': [], 'disktotal': '52881784832', 'version': 3}, '7efff64e-72bf-4369-9adf-8633fdef0a32': {'status': 'Active', 'diskfree': '681718054912', 'isoprefix': '', 'alerts': [], 'disktotal': '1056894353408', 'version': 3}, 'abdeeae9-be09-40d5-ab28-48cc3e1b98ae': {'status': 'Active', 'diskfree': '681718054912', 'isoprefix': '', 'alerts': [], 'disktotal': '1056894353408', 'version': 3}, 'e2a17289-66a0-4c3a-b5db-93518b3ff648': {'status': 'Attached', 'isoprefix': '', 'alerts': []}}}
'ae6330a3-bfae-4aef-ab4b-71074a336285': {'alerts': [],
'diskfree': '19327352832',
'disktotal': '52881784832',
'isoprefix': '',
'status': 'Active',
'version': 3},
Bottom line: the call to qemu-img resize failed with ENOSPC and I have no additional info on why.
Kevin any idea on why qemu-img resize may have failed? The command was:
qemu-img resize -f qcow2 /rhev/data-center/mnt/blockSD/ae6330a3-bfae-4aef-ab4b-71074a336285/images/a32c5423-92cc-4978-a015-b8d782bd74ba/e43aee21-e097-4b3e-b164-c526035ddbc4 16106127360
It seems that there were some LV extension requests coming from the VM (watermark):
Thread-256590::INFO::2014-04-08 14:47:26,696::vm::2483::vm.Vm::(extendDrivesIfNeeded) vmId=`280e0ee1-ad2f-4234-9d7a-5908d020ce43`::Requesting extension for volume e43aee21-e097-4b3e-b164-c526035ddbc4 on domain ae6330a3-bfae-4aef-ab4b-71074a336285 (apparent: 1073741824, capacity: 10737418240, allocated: 1080167936, physical: 1073741824)
Thread-256590::DEBUG::2014-04-08 14:47:26,696::vm::2556::vm.Vm::(__extendDriveVolume) vmId=`280e0ee1-ad2f-4234-9d7a-5908d020ce43`::Requesting an extension for the volume: {'newSize': 2048, 'domainID': 'ae6330a3-bfae-4aef-ab4b-71074a336285', 'name': u'vda', 'volumeID': 'e43aee21-e097-4b3e-b164-c526035ddbc4', 'imageID': 'a32c5423-92cc-4978-a015-b8d782bd74ba', 'poolID': '009d5dc0-517f-4c87-a300-f10a6c1e1871'}
Thread-256590::INFO::2014-04-08 14:47:26,697::logUtils::44::dispatcher::(wrapper) Run and protect: sendExtendMsg(spUUID='009d5dc0-517f-4c87-a300-f10a6c1e1871', volDict={'newSize': 2048, 'domainID': 'ae6330a3-bfae-4aef-ab4b-71074a336285', 'name': u'vda', 'volumeID': 'e43aee21-e097-4b3e-b164-c526035ddbc4', 'imageID': 'a32c5423-92cc-4978-a015-b8d782bd74ba', 'poolID': '009d5dc0-517f-4c87-a300-f10a6c1e1871'}, newSize=2147483648, callbackFunc=<bound method Vm.__afterVolumeExtension of <vm.Vm object at 0x7f41941b8b50>>)
Thread-260360::DEBUG::2014-04-08 14:47:27,820::BindingXMLRPC::1062::vds::(wrapper) client [10.35.161.52]::call vmDestroy with ('280e0ee1-ad2f-4234-9d7a-5908d020ce43',) {} flowID [30cf70c4]
Relevant reported sizes: allocated: 1080167936, physical: 1073741824
It seems that there was no SPM to satisfy the requests (or it didn't get to it fast enough) but that's a different issue.
Anyway the final situation that triggered the issue is that the LV was full and the qemu-img resize failed with ENOPSC:
Thread-260379::DEBUG::2014-04-08 14:47:59,557::utils::556::root::(execCmd) '/usr/bin/qemu-img resize -f qcow2 /rhev/data-center/mnt/blockSD/ae6330a3-bfae-4aef-ab4b-71074a336285/images/a32c5423-92cc-4978-a015-b8d782bd74ba/e43aee21-e097-4b3e-b164-c526035ddbc4 16106127360' (cwd None)
Thread-260379::DEBUG::2014-04-08 14:47:59,678::utils::576::root::(execCmd) FAILED: <err> = 'Error resizing image (28)\n'; <rc> = 1
Thread-260379::ERROR::2014-04-08 14:48:05,824::dispatcher::70::Storage.Dispatcher.Protect::(run) ecode=1, stdout=[], stderr=['Error resizing image (28)'], message=None
Traceback (most recent call last):
File "/usr/share/vdsm/storage/dispatcher.py", line 62, in run
result = ctask.prepare(self.func, *args, **kwargs)
File "/usr/share/vdsm/storage/task.py", line 1159, in prepare
raise self.error
QImgError: ecode=1, stdout=[], stderr=['Error resizing image (28)'], message=None
A possible solution would be to catch the specific ENOPSC error and extend the LV before trying again.
(In reply to Elad from comment #5) > (In reply to Allon Mureinik from comment #4) > > Elad, does this reproduce in a non-shared DC too? > > Works fine in local DC for both preallocated and for thin-provision disks Can you please test with a non-mixed block domain? (e.g., iSCSI only) Tried several live resizes. Tried with all combinations (block and file with thin and preallocated disks) on non-shared DC. All live resizes went fine. (In reply to Federico Simoncelli from comment #7) > It seems that there were some LV extension requests coming from the VM > (watermark): > > Thread-256590::INFO::2014-04-08 > 14:47:26,696::vm::2483::vm.Vm::(extendDrivesIfNeeded) > vmId=`280e0ee1-ad2f-4234-9d7a-5908d020ce43`::Requesting extension for volume > e43aee21-e097-4b3e-b164-c526035ddbc4 on domain > ae6330a3-bfae-4aef-ab4b-71074a336285 (apparent: 1073741824, capacity: > 10737418240, allocated: 1080167936, physical: 1073741824) > Thread-256590::DEBUG::2014-04-08 > 14:47:26,696::vm::2556::vm.Vm::(__extendDriveVolume) > vmId=`280e0ee1-ad2f-4234-9d7a-5908d020ce43`::Requesting an extension for the > volume: {'newSize': 2048, 'domainID': > 'ae6330a3-bfae-4aef-ab4b-71074a336285', 'name': u'vda', 'volumeID': > 'e43aee21-e097-4b3e-b164-c526035ddbc4', 'imageID': > 'a32c5423-92cc-4978-a015-b8d782bd74ba', 'poolID': > '009d5dc0-517f-4c87-a300-f10a6c1e1871'} > Thread-256590::INFO::2014-04-08 > 14:47:26,697::logUtils::44::dispatcher::(wrapper) Run and protect: > sendExtendMsg(spUUID='009d5dc0-517f-4c87-a300-f10a6c1e1871', > volDict={'newSize': 2048, 'domainID': > 'ae6330a3-bfae-4aef-ab4b-71074a336285', 'name': u'vda', 'volumeID': > 'e43aee21-e097-4b3e-b164-c526035ddbc4', 'imageID': > 'a32c5423-92cc-4978-a015-b8d782bd74ba', 'poolID': > '009d5dc0-517f-4c87-a300-f10a6c1e1871'}, newSize=2147483648, > callbackFunc=<bound method Vm.__afterVolumeExtension of <vm.Vm object at > 0x7f41941b8b50>>) > > Thread-260360::DEBUG::2014-04-08 > 14:47:27,820::BindingXMLRPC::1062::vds::(wrapper) client > [10.35.161.52]::call vmDestroy with > ('280e0ee1-ad2f-4234-9d7a-5908d020ce43',) {} flowID [30cf70c4] > > Relevant reported sizes: allocated: 1080167936, physical: 1073741824 > It seems that there was no SPM to satisfy the requests (or it didn't get to > it fast enough) but that's a different issue. This was because of bug 1083476 As the underlying issue is bug 1083476, moving this one to ON_QA for verification (which may already have been done, according to comment 9). (In reply to Allon Mureinik from comment #11) > As the underlying issue is bug 1083476, moving this one to ON_QA for > verification (which may already have been done, according to comment 9). Allon, My test described in comment #9 was on a non-shared DC. The bug occurred while I used a shared DC, so I'm not sure why this bug was moved to ON_QA. (In reply to Elad from comment #12) > (In reply to Allon Mureinik from comment #11) > > As the underlying issue is bug 1083476, moving this one to ON_QA for > > verification (which may already have been done, according to comment 9). > > Allon, > My test described in comment #9 was on a non-shared DC. > The bug occurred while I used a shared DC, so I'm not sure why this bug was > moved to ON_QA. The original bug was described as: > failure to resize a disk which resides on a block domain when the pool type is > reported as file (NFS) This, by definition is a shared DC. What am I missing? On a shared DC with NFS domain as master and iSCSI domain: Created a VM with a disk resides on the iSCSI domain and resized it from 2G to 4G. Resize succeeded, tried with both preallocated and thin-provision disks. Verified using av9.1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0504.html |
Created attachment 884000 [details] logs from engine and host Description of problem: failure to resize a disk which resides on a block domain when the pool type is reported as file (NFS) (by getStoragePoolInfo) Version-Release number of selected component (if applicable): rhevm-3.4.0-0.12.beta2.el6ev.noarch vdsm-4.14.2-0.2.el6ev.x86_64 libvirt-0.10.2-29.el6_5.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.7.x86_64 How reproducible: Not always Steps to Reproduce: On a shared DC with 2 storage domains from different types (file/block). Make sure than the master domain is the file domain (NFS in my case). Check that the pool type is NFS: # vdsClient -s 0 getStoragePoolInfo 1. Create a VM with a 1G disk located on the block domain attached to it 2. Resize the disk (add more 5G) Actual results: Pool type is NFS: [root@green-vdsa ~]# vdsClient -s 0 getStoragePoolInfo `vdsClient -s 0 getConnectedStoragePoolsList` name = dc1 isoprefix = /rhev/data-center/mnt/wolf.qa.lab.tlv.redhat.com:_volumes_wolf_shared_shared__iso__domain1/7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6/images/11111111-1111-1111-1111-111111111111 pool_status = connected lver = 0 spm_id = 1 master_uuid = 7efff64e-72bf-4369-9adf-8633fdef0a32 version = 3 domains = 7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6:Active,1c841bd5-e0eb-4dff-9c15-1271dab0ac9a:Active,ae6330a3-bfae-4aef-ab4b-71074a336285:Active,7efff64e-72bf-4369-9adf-8633fdef0a32:Active,abdeeae9-be09-40d5-ab28-48cc3e1b98ae:Active,e2a17289-66a0-4c3a-b5db-93518b3ff648:Attached type = NFS master_ver = 7 7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6 = {'status': 'Active', 'diskfree': '7193745162240', 'isoprefix': '/rhev/data-center/mnt/wolf.qa.lab.tlv.redhat.com:_volumes_wolf_shared_shared__iso__domain1/7233a711-98e8-4c3c-bcfa-44c4bcc4f6c6/images/11111111-1111-1111-1111-111111111111', 'alerts': [], 'disktotal': '7261350002688', 'version': 0} 1c841bd5-e0eb-4dff-9c15-1271dab0ac9a = {'status': 'Active', 'diskfree': '42278584320', 'isoprefix': '', 'alerts': [], 'disktotal': '53284438016', 'version': 3} ae6330a3-bfae-4aef-ab4b-71074a336285 = {'status': 'Active', 'diskfree': '19327352832', 'isoprefix': '', 'alerts': [], 'disktotal': '52881784832', 'version': 3} 7efff64e-72bf-4369-9adf-8633fdef0a32 = {'status': 'Active', 'diskfree': '681718054912', 'isoprefix': '', 'alerts': [], 'disktotal': '1056894353408', 'version': 3} abdeeae9-be09-40d5-ab28-48cc3e1b98ae = {'status': 'Active', 'diskfree': '681718054912', 'isoprefix': '', 'alerts': [], 'disktotal': '1056894353408', 'version': 3} e2a17289-66a0-4c3a-b5db-93518b3ff648 = {'status': 'Attached', 'isoprefix': '', 'alerts': []} vdsm fails to resize the disk: Thread-260379::ERROR::2014-04-08 14:48:05,824::dispatcher::70::Storage.Dispatcher.Protect::(run) ecode=1, stdout=[], stderr=['Error resizing image (28)'], message=None Traceback (most recent call last): File "/usr/share/vdsm/storage/dispatcher.py", line 62, in run result = ctask.prepare(self.func, *args, **kwargs) File "/usr/share/vdsm/storage/task.py", line 1159, in prepare raise self.error QImgError: ecode=1, stdout=[], stderr=['Error resizing image (28)'], message=None Expected results: Disk resize should work when the domain where the disk is located is from a different type from the pool type Additional info: logs from engine and host