Bug 1201268
Summary: | CopyImage on iscsi sometimes fails when cloning a vm from snapshot. | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Raz Tamir <ratamir> | ||||||||||||
Component: | General | Assignee: | Greg Padgett <gpadgett> | ||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 4.16.12 | CC: | alitke, amureini, aperotti, bazulay, bcholler, bugs, cmestreg, colin.coe, ebenahar, gchakkar, gklein, gpadgett, gveitmic, jentrena, lpeer, lsurette, ratamir, tnisan, v.astafiev, yeylon, ykaul, ylavi | ||||||||||||
Target Milestone: | ovirt-3.6.3 | Keywords: | AutomationBlocker, Reopened | ||||||||||||
Target Release: | 4.17.23 | Flags: | rule-engine:
ovirt-3.6.z+
ylavi: planning_ack+ rule-engine: devel_ack+ rule-engine: testing_ack+ |
||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 1301709 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2016-03-10 12:48:40 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1301709, 1308626 | ||||||||||||||
Attachments: |
|
Description
Raz Tamir
2015-03-12 12:31:07 UTC
Raz, can you list the VDSM buid you used? Thanks. This isn't the root cause, but we're getting the actual error being masked from us: CopyImageError: low level Image copy failed: (u"src image=e14d2630-6d3a-47b9-ad86-81705357c03e, dst image=a3b0dddd-ceb8-41c8-9a27-0aa0d9207736: msg=local variable 'e' referenced before assignment",) Hi Allon, Red Hat Enterprise Linux Server release 7.1 (Maipo) vdsm-4.16.12-2.el7ev.x86_64 qemu-img-rhev-2.1.2-23.el7_1.1.x86_64 libvirt-1.2.8-16.el7.x86_64 Raz, it's extremely difficult to trace this error due to the error described in comment 2. Can you try to apply https://gerrit.ovirt.org/#/c/38669/2 on your, reproduce, and supply the logs? Thanks! Created attachment 1005736 [details] new logs Hi Allon, New logs attached with after applying https://gerrit.ovirt.org/#/c/38669/2 . Hi Raz, The attached logs do not seem to have the appropriate fix. Can you share your system's details and we'll apply it manually? The relevant error: 'qemu-img: error while writing sector 2093056: No space left on device' Seems pretty straight forward. Are you sure there is indeed space on the device? I will test it and update soon I tried to reproduce it and its not that easy. My suggestion is to keep this bug and we will start running the relevant cases, and if it will occur again I will update. I'm putting NEEDINFO for myself to be alerted in 1 week (In reply to ratamir from comment #29) > I tried to reproduce it and its not that easy. My suggestion is to keep this > bug and we will start running the relevant cases, and if it will occur again > I will update. I'm putting NEEDINFO for myself to be alerted in 1 week This just adds noise to the scrubbing efforts. Considering the fact that the obvious suspect, bug 1160620 is already fixed and we've got zero evidence in the logs to suggest anything else is going on, I'm closing this as not-reproducible. If you ARE able to reproduce, please reopen it. Ok thanks Hi Allon, I'm reopening this as this happen again (rhevm-3.6.0-0.12): From vdsm.log: a3508590-3c5e-4d7f-bbab-c37a69ba2da0::DEBUG::2015-08-26 12:02:13,978::utils::708::root::(watchCmd) FAILED: <err> = ['qemu-img: error while writing sector 4190208: No space left on device', 'qemu-img: Failed to flush the refcount block cache: No space left on device']; <rc> = 1 a3508590-3c5e-4d7f-bbab-c37a69ba2da0::ERROR::2015-08-26 12:02:13,979::image::841::Storage.Image::(copyCollapsed) conversion failure for volume dc7069c4-2f03-4568-893c-c3b2ec10389b Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 836, in copyCollapsed volume.fmt2str(dstVolFormat)) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 209, in convert raise QImgError(rc, out, err) QImgError: ecode=1, stdout=[], stderr=['qemu-img: error while writing sector 4190208: No space left on device', 'qemu-img: Failed to flush the refcount block cache: No space left on device'], message=None a3508590-3c5e-4d7f-bbab-c37a69ba2da0::ERROR::2015-08-26 12:02:13,980::image::854::Storage.Image::(copyCollapsed) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 842, in copyCollapsed raise se.CopyImageError(str(e)) CopyImageError: low level Image copy failed: ("ecode=1, stdout=[], stderr=['qemu-img: error while writing sector 4190208: No space left on device', 'qemu-img: Failed to flush the refcount block cache: No space left on device'], message=None",) from engine.log: 2015-08-26 11:59:54,157 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (ajp-/127.0.0.1:8702-11) [1a09eb2b] START, CopyImageVDSCommand( CopyImageVDSCommandParameters:{runAsync='true', storagePoolId='0c21cb15-3549-4211-acb2-f0319d33f078', ignoreFailoverLimit='false', storageDomainId='874fa9cf-a6cd-4163-8d30-f39b1a550e0f', imageGroupId='ba5b6007-7ab4-4b51-933a-abb13d248bd0', imageId='6861dfa0-583c-4e7c-a800-e408f63810bd', dstImageGroupId='044f01df-585b-46be-a61d-4ddc8daf9b9d', vmId='ac3038af-bbae-43f3-89f8-f50c0ce31894', dstImageId='480c6c14-3384-47c9-bd1c-fbc82949701c', imageDescription='', dstStorageDomainId='4f3f28b4-9472-4bf2-aff0-ee9d10932fb8', copyVolumeType='LeafVol', volumeFormat='COW', preallocate='Sparse', postZero='false', force='false'}), log id: 51b21bd2 2015-08-26 11:59:55,772 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (ajp-/127.0.0.1:8702-11) [1390691c] START, CopyImageVDSCommand( CopyImageVDSCommandParameters:{runAsync='true', storagePoolId='0c21cb15-3549-4211-acb2-f0319d33f078', ignoreFailoverLimit='false', storageDomainId='874fa9cf-a6cd-4163-8d30-f39b1a550e0f', imageGroupId='4248202c-54eb-4acb-a2a4-b70309a09955', imageId='dc7069c4-2f03-4568-893c-c3b2ec10389b', dstImageGroupId='cae267c2-3d3a-4886-8441-4a4472b11f09', vmId='ac3038af-bbae-43f3-89f8-f50c0ce31894', dstImageId='1492778c-bb67-43f7-9460-eddd317e7bd4', imageDescription='', dstStorageDomainId='4f3f28b4-9472-4bf2-aff0-ee9d10932fb8', copyVolumeType='LeafVol', volumeFormat='COW', preallocate='Sparse', postZero='false', force='false'}), log id: 77b9046a 2015-08-26 12:02:37,599 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-71) [3a2960f9] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command failed: low level Image copy failed 2015-08-26 12:02:37,613 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler_Worker-71) [3a2960f9] BaseAsyncTask::logEndTaskFailure: Task 'a3508590-3c5e-4d7f-bbab-c37a69ba2da0' (Parent Command 'AddVmFromSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed, code = 261', -- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed, code = 261' And about comment 27 - Are you sure there is indeed space on the device?: The SD before the vm creation from template: <storage_domain href="/api/storagedomains/4f3f28b4-9472-4bf2-aff0-ee9d10932fb8" id="4f3f28b4-9472-4bf2-aff0-ee9d10932fb8"> <name>iscsi_1</name> <data_centers> <data_center id="0c21cb15-3549-4211-acb2-f0319d33f078"/> </data_centers> <type>data</type> <master>false</master> <storage> <type>iscsi</type> <volume_group id="4NV0hb-8Zd2-uazx-Mydx-kZxK-lTFe-YW1Xzf"> <logical_unit id="360060160f4a03000afc22c03baece411"> ….. <size>161061273600</size> </logical_unit> <logical_unit id="360060160f4a03000afc22c03baece411"> ….. <size>161061273600</size> </logical_unit> </volume_group> </storage> <available>155692564480</available> <used>4294967296</used> <committed>0</committed> </storage_domain> New logs attached (vdsm + engine) and full run log Created attachment 1067220 [details]
vdsm + engine logs
Created attachment 1067221 [details]
run log
Raz, please supply us with a clear reproducer for this issue 1) create vm with 2 disks - 2GB each - 1 thin, 1 preallocated 2) create snapshot 3) clone vm from that snapshot Command sent using the REST api (from the attached run log): 2) At 2015-08-26 11:59:10,170 3) At 2015-08-26 12:03:57,114 (In reply to ratamir from comment #36) > 1) create vm with 2 disks - 2GB each - 1 thin, 1 preallocated > 2) create snapshot > 3) clone vm from that snapshot > > Command sent using the REST api (from the attached run log): > > 2) At 2015-08-26 11:59:10,170 > > 3) At 2015-08-26 12:03:57,114 Raz - a couple of questions to help us assess what's going on here: 1. Does this reproduce on EVERY copyImage call, or just sporadically? 2. Is this limited to a single storage server or does it reproduce in different setups? 3. Can you share the list of libvirt*, qemu* and vdsm* RPMS on the host? Thanks! Allon, 1) No, this isn't easy to reproduce but it happen from time to time in our automation 2) This is not limited to single setup. I checked it on 4 different setups 3) Sure: # rpm -qa | grep libvirt* libvirt-daemon-kvm-1.2.17-5.el7.x86_64 libvirt-daemon-driver-nwfilter-1.2.17-5.el7.x86_64 libvirt-daemon-driver-lxc-1.2.17-5.el7.x86_64 libvirt-daemon-driver-storage-1.2.17-5.el7.x86_64 libvirt-lock-sanlock-1.2.17-5.el7.x86_64 libvirt-client-1.2.17-5.el7.x86_64 libvirt-daemon-driver-network-1.2.17-5.el7.x86_64 libvirt-daemon-config-nwfilter-1.2.17-5.el7.x86_64 libvirt-daemon-driver-qemu-1.2.17-5.el7.x86_64 libvirt-daemon-driver-nodedev-1.2.17-5.el7.x86_64 libvirt-daemon-driver-interface-1.2.17-5.el7.x86_64 libvirt-1.2.17-5.el7.x86_64 libvirt-daemon-1.2.17-5.el7.x86_64 libvirt-daemon-config-network-1.2.17-5.el7.x86_64 libvirt-daemon-driver-secret-1.2.17-5.el7.x86_64 libvirt-python-1.2.17-2.el7.x86_64 # rpm -qa | grep vdsm* vdsm-hook-ethtool-options-4.17.3-1.el7ev.noarch vdsm-xmlrpc-4.17.3-1.el7ev.noarch vdsm-4.17.3-1.el7ev.noarch vdsm-jsonrpc-4.17.3-1.el7ev.noarch vdsm-python-4.17.3-1.el7ev.noarch vdsm-cli-4.17.3-1.el7ev.noarch vdsm-infra-4.17.3-1.el7ev.noarch vdsm-yajsonrpc-4.17.3-1.el7ev.noarch this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015. Please review this bug and if not a blocker, please postpone to a later release. All bugs not postponed on GA release will be automatically re-targeted to - 3.6.1 if severity >= high - 4.0 if severity < high Seems to be reproduced for the following scenario: - VM with more than one snapshot - Merge one of the snapshots - Clone a VM from the other snapshot CopyImage fails with "low level Image copy failed" (In reply to Elad from comment #40) > Seems to be reproduced for the following scenario: > - VM with more than one snapshot > - Merge one of the snapshots > - Clone a VM from the other snapshot > > CopyImage fails with "low level Image copy failed" Can you please share the RPM versions and the logs for this scenario? Thanks! Created attachment 1101802 [details] logs from engine and host (In reply to Allon Mureinik from comment #41) > Can you please share the RPM versions and the logs for this scenario? > Thanks! rhevm-3.6.1-0.2.el6.noarch vdsm-cli-4.17.11-0.el7ev.noarch vdsm-yajsonrpc-4.17.11-0.el7ev.noarch vdsm-infra-4.17.11-0.el7ev.noarch vdsm-xmlrpc-4.17.11-0.el7ev.noarch vdsm-4.17.11-0.el7ev.noarch vdsm-python-4.17.11-0.el7ev.noarch vdsm-jsonrpc-4.17.11-0.el7ev.noarch libvirt-daemon-driver-network-1.2.17-13.el7.x86_64 libvirt-daemon-config-nwfilter-1.2.17-13.el7.x86_64 libvirt-client-1.2.17-13.el7.x86_64 libvirt-python-1.2.17-2.el7.x86_64 libvirt-daemon-driver-interface-1.2.17-13.el7.x86_64 libvirt-daemon-driver-nwfilter-1.2.17-13.el7.x86_64 libvirt-daemon-driver-secret-1.2.17-13.el7.x86_64 libvirt-daemon-driver-storage-1.2.17-13.el7.x86_64 libvirt-daemon-kvm-1.2.17-13.el7.x86_64 libvirt-lock-sanlock-1.2.17-13.el7.x86_64 libvirt-daemon-driver-qemu-1.2.17-13.el7.x86_64 libvirt-daemon-1.2.17-13.el7.x86_64 libvirt-daemon-driver-nodedev-1.2.17-13.el7.x86_64 qemu-img-rhev-2.3.0-31.el7_2.4.x86_64 ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch qemu-kvm-tools-rhev-2.3.0-31.el7_2.4.x86_64 qemu-kvm-common-rhev-2.3.0-31.el7_2.4.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64 libvirt-daemon-driver-qemu-1.2.17-13.el7.x86_64 sanlock-3.2.4-1.el7.x86_64 selinux-policy-3.13.1-60.el7.noarch python-2.7.5-34.el7.x86_64 kernel - 3.10.0-327.el7.x86_64 ======================================== cafa593d-5b52-490a-a4bf-6c9797eec778::ERROR::2015-12-03 19:59:37,053::image::841::Storage.Image::(copyCollapsed) conversion failure for volume bf3bb47f-2de6-407f-a689-5867765b3ea7 Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 836, in copyCollapsed volume.fmt2str(dstVolFormat)) File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 209, in convert raise QImgError(rc, out, err) QImgError: ecode=1, stdout=[], stderr=['qemu-img: /rhev/data-center/mnt/blockSD/3273d259-ea6c-4a9e-97a7-e5f3020bfa77/images/1407a188-f1e4-408d-9f5f-c7e38d8e0423/32aa855d-8337-4f4c-9688-77a7f716824f: error while converting raw: Device is too small'], message=None cafa593d-5b52-490a-a4bf-6c9797eec778::ERROR::2015-12-03 19:59:37,070::image::854::Storage.Image::(copyCollapsed) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 842, in copyCollapsed raise se.CopyImageError(str(e)) CopyImageError: low level Image copy failed: ("ecode=1, stdout=[], stderr=['qemu-img: /rhev/data-center/mnt/blockSD/3273d259-ea6c-4a9e-97a7-e5f3020bfa77/images/1407a188-f1e4-408d-9f5f-c7e38d8e0423/32aa855d-8337-4f4c-9688-77a7f716824f: error while converting raw: Device is too small'], message=None",) Now that we have a proper error message: """ CopyImageError: low level Image copy failed: ("ecode=1, stdout=[], stderr=['qemu-img: /rhev/data-center/mnt/blockSD/3273d259-ea6c-4a9e-97a7-e5f3020bfa77/images/1407a188-f1e4-408d-9f5f-c7e38d8e0423/32aa855d-8337-4f4c-9688-77a7f716824f: error while converting raw: Device is too small'], message=None",) """ Elad/Raz - do you really have such a small device? If not, it may be a regression in qemu-img. Adam - https://gerrit.ovirt.org/#/c/53318/ is merged to the 3.6 branch. Is there anything else we need here? This was solved by the same patch as bug 1301709. Moving to ON_QA to verify. Nothing else is required. Hi Greg, can I verify based on https://bugzilla.redhat.com/show_bug.cgi?id=1301709#c8 and https://bugzilla.redhat.com/show_bug.cgi?id=1301709#c9 ? Moving to VERIFIED based on https://bugzilla.redhat.com/show_bug.cgi?id=1301709#c8 and https://bugzilla.redhat.com/show_bug.cgi?id=1301709#c9 |