Bug 1253975

Summary: [vdsm] extendVolumeSize task is not cleared in case of a live merge failure for a volume that was extended
Product: [oVirt] vdsm Reporter: Elad <ebenahar>
Component: GeneralAssignee: Adam Litke <alitke>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.17.0CC: acanan, amureini, bazulay, bugs, ebenahar, ecohen, gklein, lsurette, mgoldboi, rbalakri, tnisan, ycui, yeylon, ylavi
Target Milestone: ovirt-3.6.0-rc3Flags: rule-engine: ovirt-3.6.0+
rule-engine: blocker+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+
Target Release: 4.17.8   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-04 13:41:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Screenshot from Webadmin none

Description Elad 2015-08-16 09:17:44 UTC
Created attachment 1063474 [details]
Screenshot from Webadmin

Description of problem:

Live merge operation fails on VDSM as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1253974
The operation is called over and over again, the task is not cleared on VDSM. This causes to an endless VDSM failures, it floods the event log in the Webadmin as seen in the attached screenshot.

Version-Release number of selected component (if applicable):
ovirt-3.6.0-5
vdsm-4.17.0-1239.git6575e3f.el7.noarch
libvirt-daemon-1.2.8-16.el7_1.3.x86_64
qemu-kvm-ev-2.1.2-23.el7_1.6.1.x86_64
sanlock-3.2.2-2.el7.x86_64
selinux-policy-3.13.1-23.el7_1.13.noarch
python-2.7.5-18.el7_1.1.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a VM with 1gb preallocated disk (probably thin as well, didn't test that) in a block-based domain.
2. Create a snapshot.
3. Extend the disk to 2gb.
4. Start the VM.
5. Delete the snapshot.


Actual results:
Live merge fails as reported in: 

acff4189-cfc2-404d-9c03-ddfb33e36104::ERROR::2015-08-16 10:45:58,925::task::866::Storage.TaskManager.Task::(_setError) Task=`acff4189-cfc2-404d-9c03-ddfb33e36104`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 332, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1304, in extendVolumeSize
    .produceVolume(imgUUID, volUUID).extendSize(int(newSize))
  File "/usr/share/vdsm/storage/volume.py", line 552, in extendSize
    raise se.VolumeNonWritable(self.volUUID)
VolumeNonWritable: Volume cannot be access to writes: ["u'46d93a59-0ec1-4c7c-9f25-816be8d0760b'"]


[root@green-vdsc mnt]# vdsClient -s 0 getAllTasks
1169f9ef-1560-4d5d-aa72-03da356e1985 :
         verb = extendVolumeSize
         code = 100
         state = recovered
         tag = spm
         result = 
         message = Volume cannot be access to writes: ["u'b7ee712c-d029-452e-bbb4-1ee3b451f97b'"]
         id = 1169f9ef-1560-4d5d-aa72-03da356e1985
acff4189-cfc2-404d-9c03-ddfb33e36104 :
         verb = extendVolumeSize
         code = 100
         state = recovered
         tag = spm
         result = 
         message = Volume cannot be access to writes: ["u'46d93a59-0ec1-4c7c-9f25-816be8d0760b'"]
         id = acff4189-cfc2-404d-9c03-ddfb33e36104

extendVolumeSize task takes place and fails over and over.


Expected results:
In case of a live merge failure, the task should be stopped and cleared.

Additional info:
- Screenshot from Webadmin
- sosreport from host: http://file.tlv.redhat.com/ebenahar/sosreport-green-vdsc.qa.lab.tlv.redhat.com-20150816120535.tar.xz
- sosreport from engine: http://file.tlv.redhat.com/ebenahar/sosreport-RHEL6.7Server-20150816120920.tar.xz

Comment 1 Yaniv Lavi 2015-10-07 13:22:57 UTC
Can you please recreate on latest build?

Comment 2 Elad 2015-10-11 15:31:00 UTC
Using latest build, the bug is not reproduced.
Live merge works as expected after performing the steps described in the bug description.

Tested using:
3.6.0-15 
vdsm-4.17.8-1.el7ev.noarch
rhevm-3.6.0-0.18.el6.noarch

Comment 3 Sandro Bonazzola 2015-11-04 13:41:10 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.