Bug 1320050

Summary: [vdsm] Snapshot merge fails on "SourceImageActionError: Error during source image manipulation"
Product: [oVirt] vdsm Reporter: Elad <ebenahar>
Component: GeneralAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.18.0CC: ahino, amureini, bugs, dev-unix-virtualization, mst, stirabos, tnisan
Target Milestone: ovirt-4.0.0-rcKeywords: Regression
Target Release: 4.18.9Flags: rule-engine: ovirt-4.0.0+
rule-engine: blocker+
rule-engine: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-05 07:45:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm, libvirt, messages and engine logs
none
vdsm server and engine logs none

Description Elad 2016-03-22 08:40:55 UTC
Description of problem:
Snapshot merge (cold) fails every time, on file and block storage, with the following error message in vdsm:


d97ca3a1-7b65-47eb-b8df-a3664f4e43a6::ERROR::2016-03-22 10:27:49,527::image::1338::Storage.Image::(merge) can't set attribute
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 1321, in merge
    [vols[vName] for vName in chain])
  File "/usr/share/vdsm/storage/image.py", line 1146, in _baseRawVolumeMerge
    srcVol.rename(tmpUUID)
  File "/usr/share/vdsm/storage/fileVolume.py", line 589, in rename
    self._manifest.volumePath = volPath
AttributeError: can't set attribute

d97ca3a1-7b65-47eb-b8df-a3664f4e43a6::ERROR::2016-03-22 10:27:49,532::task::868::Storage.TaskManager.Task::(_setError) Task=`d97ca3a1-7b65-47eb-b8df-a3664f4e43a6`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 334, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 78, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1864, in mergeSnapshots
    sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
  File "/usr/share/vdsm/storage/image.py", line 1339, in merge
    raise se.SourceImageActionError(imgUUID, sdUUID, str(e))
SourceImageActionError: Error during source image manipulation: u"image=5c912ed8-7c34-476e-8a07-8e28c2a49e95, source domain=dc52ca50-9f03-45c9-be9e-1a4a1f84cdde: can't set attribute"


Version-Release number of selected component (if applicable):
vdsm-4.17.999-724.gitb8cb30a.el7.centos.noarch
libvirt-daemon-1.2.17-13.el7_2.4.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.9.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a VM
2. Create a snapshot 
3. Delete the snapshot
(Reproduced over file and block storage)

Actual results:
Snspshot merge fails

Expected results:
Snapshot merge succeeds

Additional info:
vdsm, libvirt, messages and engine logs

Comment 1 Elad 2016-03-22 08:45:50 UTC
ovirt-engine-4.0.0-0.0.master.20160310051350.git3151c66.el7.centos.noarch

Comment 2 Tal Nisan 2016-03-23 09:44:43 UTC
Adam, as this week's QE contact please have a look

Comment 3 Red Hat Bugzilla Rules Engine 2016-03-23 09:44:50 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 Elad 2016-03-24 13:54:56 UTC
Created attachment 1139978 [details]
vdsm, libvirt, messages and engine logs

Comment 5 Allon Mureinik 2016-03-28 14:16:02 UTC
(In reply to Elad from comment #0)
> Description of problem:
> Snapshot merge (cold) fails every time, on file and block storage, with the
> following error message in vdsm:
> 
> 
> d97ca3a1-7b65-47eb-b8df-a3664f4e43a6::ERROR::2016-03-22
> 10:27:49,527::image::1338::Storage.Image::(merge) can't set attribute
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/image.py", line 1321, in merge
>     [vols[vName] for vName in chain])
>   File "/usr/share/vdsm/storage/image.py", line 1146, in _baseRawVolumeMerge
>     srcVol.rename(tmpUUID)
>   File "/usr/share/vdsm/storage/fileVolume.py", line 589, in rename
>     self._manifest.volumePath = volPath
> AttributeError: can't set attribute
The _manifest was only introduced post 3.6, re-targettitng accordingly.

Comment 6 dev-unix-virtualization 2016-04-08 17:34:30 UTC
We have faced the same issue. But, deletion of 2nd snapshot of a power-off VM happens successfully.

We have not seen this issue with live VM.

We are using ovirt-engine-3.6.2.6-1.el7.centos.noarch and vdsm-4.17.999-900.giteba4b25.el7.centos.noarch

Comment 7 Sandro Bonazzola 2016-05-02 09:52:41 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 8 Ala Hino 2016-05-15 14:52:03 UTC
This was fixed as part of the work done for 3.6.6 - moving to ON_QA.

Comment 9 Allon Mureinik 2016-05-22 11:45:21 UTC
Ala, is there anything we need to document here, or is it documented elsewhere?
Please either provide the doctext, or comment with the BZ tracking the doc text and set requires-doctext-.

Comment 10 Ala Hino 2016-05-22 11:49:28 UTC
BZ 1323629 documents the behavior

Comment 11 Kevin Alon Goldblatt 2016-06-05 15:02:12 UTC
Test the with the following code:
------------------------------------------
rhevm-4.0.0.2-0.1.el7ev.noarch
vdsm-4.18.1-11.gita92976e.el7ev.x86_64


Tested with the following scenario:
-------------------------------------
1.Created Vm with 3 disks, 1 iscsi - preallocated, 1 iscsi - thin and 1 nfs - thin
2. Created snapshots sbu1, sbu2 and sbu3 while VM is dowm
3. Started the VM and created su1, su2 and su3
4. Selected snbu2 and tried deleting >>>>> The operation failed with "Failed to delete snapshot 'sbu2' for VM 'vm1'."
vdsm log displays error Volume  does not exist"






Engine.log:
-----------------
2016-06-05 14:11:54,499 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [] Correlation ID: 69d26cca, Job ID: d1762475-b0e4-44c6-9d15-4d36f2b1a4e0, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'sbu2' deletion for VM 'vm1' was initiated by admin@internal-authz.
.
.
.
2016-06-05 14:14:09,898 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (DefaultQuartzScheduler2) [6067c86] START, GetImageInfoVDSCommand( GetImageInfoVDSCommandPar
ameters:{runAsync='true', storagePoolId='3f76ceea-84c5-4893-8c7d-3bf2d8eea69e', ignoreFailoverLimit='false', storageDomainId='7590d3ff-797a-4042-90f8-4094e6d01665', imageGroupId='181ba273-4
150-4ad0-a209-95184cd1fbd4', imageId='3c27fae4-6ac7-436e-8302-e165632e5108'}), log id: 135a9ad8
2016-06-05 14:14:10,928 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (DefaultQuartzScheduler2) [6067c86] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine
.core.common.businessentities.storage.DiskImage@7786d8dc, log id: 135a9ad8
2016-06-05 14:14:11,206 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler2) [6067c86] transaction rolled back
2016-06-05 14:14:11,206 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler2) [6067c86] transaction rolled back
2016-06-05 14:14:11,206 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler2) [6067c86] Error invoking callback method 'onFailed' for 'FAILED' command 'c2
b114ea-5b53-4686-86c1-175c97ad19a8'
2016-06-05 14:14:11,206 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler2) [6067c86] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.utils.ovf.OvfWriter.buildDisk(OvfWriter.java:204) [utils.jar:]
        at org.ovirt.engine.core.utils.ovf.OvfManager.buildOvf(OvfManager.java:118) [utils.jar:]
        at org.ovirt.engine.core.utils.ovf.OvfManager.exportVm(OvfManager.java:29) [utils.jar:]
        at org.ovirt.engine.core.bll.storage.disk.image.ImagesHandler.prepareSnapshotConfigWithAlternateImage(ImagesHandler.java:829) [bll.jar:]
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.lambda$updateVmConfigurationForImageChange$1(RemoveSnapshotSingleDiskLiveCommand.java:394) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [utils.jar:]
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.updateVmConfigurationForImageChange(RemoveSnapshotSingleDiskLiveCommand.java:392) [bll.jar:]
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.handleBackwardLiveMerge(RemoveSnapshotSingleDiskLiveCommand.java:382) [bll.jar:]
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.lambda$syncDbRecords$0(RemoveSnapshotSingleDiskLiveCommand.java:303) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [utils.jar:]
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.syncDbRecords(RemoveSnapshotSingleDiskLiveCommand.java:291) [bll.jar:]
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.onFailed(RemoveSnapshotSingleDiskLiveCommand.java:447) [bll.jar:]
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback.onFailed(RemoveSnapshotSingleDiskLiveCommandCallback.java:18) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.endCallback(CommandCallbacksPoller.java:56) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) [:1.8.0_91]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_91]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_91]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_91]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_91]

2016-06-05 14:14:12,249 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [6067c86] Command 'RemoveSnapshot' id: 'fec75b49-40a7-4439-8dde-
d9db3bb221b2' child commands '[b77126ce-f696-4bdc-a0f6-b411841c5ad4, 24b9ea03-2be1-4f0a-a1ab-3ba54c209f58, c2b114ea-5b53-4686-86c1-175c97ad19a8]' executions were completed, status 'FAILED'
2016-06-05 14:14:13,520 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (DefaultQuartzScheduler10) [6067c86] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnap
shotCommand' with failure.
2016-06-05 14:14:13,724 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler10) [6067c86] Correlation ID: 69d26cca, Job ID: d1762475-b0e4-44c6-9d15-4d36f2b1a4e0, Call Stack: null, Custom Event ID: -1, Message: Failed to delete snapshot 'sbu2' for VM 'vm1'.




VDSM log:
--------------------------------
Thread-430::DEBUG::2016-06-05 14:13:30,758::resourceManager::241::Storage.ResourceManager.Request::(grant) ResName=`Storage.7590d3ff-797a-4042-90f8-4094e6d01665`ReqID=`8e5f76d1-4683-46d5-b823-43ad815e8165`::Granted request
Thread-430::DEBUG::2016-06-05 14:13:30,758::task::829::Storage.TaskManager.Task::(resourceAcquired) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::_resourcesAcquired: Storage.7590d3ff-797a-4042-90f8-4094e6d01665 (shared)
Thread-430::DEBUG::2016-06-05 14:13:30,758::task::995::Storage.TaskManager.Task::(_decref) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::ref 1 aborting False
Thread-430::DEBUG::2016-06-05 14:13:30,759::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3514f0c5a516000c8|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 7590d3ff-797a-4042-90f8-4094e6d01665 (cwd None)
Thread-430::DEBUG::2016-06-05 14:13:30,961::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0
Thread-430::DEBUG::2016-06-05 14:13:30,963::lvm::463::Storage.LVM::(_reloadlvs) lvs reloaded
Thread-430::WARNING::2016-06-05 14:13:30,963::lvm::609::Storage.LVM::(getLv) lv: 9f680fdb-fe8a-462c-9977-7aec183ec6af not found in lvs vg: 7590d3ff-797a-4042-90f8-4094e6d01665 response
Thread-430::ERROR::2016-06-05 14:13:30,963::task::868::Storage.TaskManager.Task::(_setError) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3039, in getVolumeInfo
    volUUID=volUUID).getInfo()
  File "/usr/share/vdsm/storage/sd.py", line 619, in produceVolume
    volUUID)
  File "/usr/share/vdsm/storage/blockVolume.py", line 370, in __init__
    manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/blockVolume.py", line 72, in __init__
    volUUID)
  File "/usr/share/vdsm/storage/volume.py", line 80, in __init__
    self.validate()
  File "/usr/share/vdsm/storage/blockVolume.py", line 156, in validate
    raise se.VolumeDoesNotExist(self.volUUID)
VolumeDoesNotExist: Volume does not exist: ('9f680fdb-fe8a-462c-9977-7aec183ec6af',)
Thread-430::DEBUG::2016-06-05 14:13:30,989::task::887::Storage.TaskManager.Task::(_run) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Task._run: 18cc50f9-e374-46bb-9389-e48f192628c8 ('7590d3ff-797a-4042-90f8-4094e6d01665', '3f76ceea-84c5-4893-8c7d-3bf2d8eea69e', '9c3f4477-4041-4c4f-8add-b547db25c839', '9f680fdb-fe8a-462c-9977-7aec183ec6af') {} failed - stopping task
Thread-430::DEBUG::2016-06-05 14:13:30,989::task::1248::Storage.TaskManager.Task::(stop) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::stopping in state preparing (force False)
Thread-430::DEBUG::2016-06-05 14:13:30,989::task::995::Storage.TaskManager.Task::(_decref) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::ref 1 aborting True
Thread-430::INFO::2016-06-05 14:13:30,989::task::1173::Storage.TaskManager.Task::(prepare) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::aborting: Task is aborted: 'Volume does not exist' - code 201
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::1178::Storage.TaskManager.Task::(prepare) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Prepare: aborted: Volume does not exist
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::995::Storage.TaskManager.Task::(_decref) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::ref 0 aborting True
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::930::Storage.TaskManager.Task::(_doAbort) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Task._doAbort: force False
Thread-430::DEBUG::2016-06-05 14:13:30,990::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::597::Storage.TaskManager.Task::(_updateState) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::moving from state preparing -> state aborting
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::552::Storage.TaskManager.Task::(__state_aborting) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::_aborting: recover policy none
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::597::Storage.TaskManager.Task::(_updateState) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::moving from state aborting -> state failed
Thread-430::DEBUG::2016-06-05 14:13:30,991::resourceManager::952::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.7590d3ff-797a-4042-90f8-4094e6d01665': < ResourceRef 'Storage.7590d3ff-797a-4042-90f8-4094e6d01665', isValid: 'True' obj: 'None'>}
Thread-430::DEBUG::2016-06-05 14:13:30,991::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}

Comment 12 Red Hat Bugzilla Rules Engine 2016-06-05 15:02:18 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 13 Kevin Alon Goldblatt 2016-06-05 15:09:34 UTC
Created attachment 1164915 [details]
vdsm server and engine logs

Added logs

Comment 14 Ala Hino 2016-06-05 15:43:29 UTC
Not related.
This issue introduced after commit 6e55d137d5ca and fixed in this commit a4602dba0f6985fd812871742d996c4a787af7bf.

Comment 15 Kevin Alon Goldblatt 2016-06-15 15:10:26 UTC
Testedwith the following code:
------------------------------------------
rhevm-4.0.0.4-0.1.el7ev.noarch
vdsm-4.18.2-0.el7ev.x86_64




Tested with the following scenario:
-------------------------------------
1.Created Vm with 3 disks, 1 iscsi - preallocated, 1 iscsi - thin and 1 nfs - thin
2. Created snapshots sd1, sd2, sd3 and sd4 while VM is dowm
3. Deleted sd2 successfully - snapshot merge successfull
4. Deleted sd3 successfully - snapshot merge successfull


Moving to VERIFIED!

Comment 16 Sandro Bonazzola 2016-07-05 07:45:23 UTC
oVirt 4.0.0 has been released, closing current release.