Bug 689253

Summary: VDSM: deleting snapshot during vdsmd restart causes task to fail -> trying to delete snapshot after failure will cause VM to get stuck on image locked
Product: Red Hat Enterprise Linux 6 Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Igor Lvovsky <ilvovsky>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: abaron, bazulay, danken, iheim, lpeer, tdosek, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.9-58.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-19 15:18:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 689221    
Attachments:
Description Flags
logs
none
logs_fixed none

Description Dafna Ron 2011-03-20 15:30:07 UTC
Created attachment 486476 [details]
logs

Description of problem:

restarting vdsm while deleting snapshot will cause task to fail. 
if image was already deleted in storage than trying to delete snapshot again will cause VM to get stuck in image locked. 


Version-Release number of selected component (if applicable):

ic105

dsm-4.9-54.el6.x86_64
vdsm-debuginfo-4.9-51.el6.x86_64
vdsm-cli-4.9-54.el6.x86_64
vdsm-hook-vhostmd-4.9-53.el6.x86_64

qemu-kvm-0.12.1.2-2.146.el6.x86_64
qemu-img-0.12.1.2-2.146.el6.x86_64
gpxe-roms-qemu-0.9.7-6.4.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. create 3 snapshots
2. delete middle snapshot
3. wait 15-20 seconds - restart vdsmd
4. try to delete snapshot again
  
Actual results:

task will fail the first time. second time, the VM will get stuck on image locked. 

Expected results:

we should implement roll-forward. 
Bug for this was opened for backend - we also need the same from vdsm. 
bug 689250  was opened for RHEL5 on same issue with different results.  

Additional info: logs (the error from the weekend is in the attached log as well)

I left the VM over the weekend. eventually you will get error: 

Thread-24::DEBUG::2011-03-20 08:44:27,756::task::491::TaskManager.Task::(_debug) Task 2cfd1ac0-2734-43e7-a575-25581ae96274: moving from state init -> state preparing
Thread-24::ERROR::2011-03-20 08:44:27,931::spm::120::Storage.SPM.Secure::(run) SPM: spm method call rejected: Not SPM!!!  method: public_mergeSnapshots, called by: _run
Thread-24::ERROR::2011-03-20 08:44:27,931::task::854::TaskManager.Task::(_setError) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 862, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/spm.py", line 121, in run
    raise se.SpmStatusError(self.name)
SpmStatusError: Not SPM: ('public_mergeSnapshots',)

Comment 3 Dafna Ron 2011-03-22 14:09:14 UTC
Created attachment 486803 [details]
logs_fixed

attached logs again - fixed

Comment 4 Tomas Dosek 2011-04-11 08:52:10 UTC
verified - vdsm-4.9-58.el6 - vm no longer hangs in locked state, appropriate message is shown to user in rhevm, vdsmd service restarts smoothly and end successfully.