Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 878970

Summary: [TEXT][Live Storage Migration]: unclear logs for failure on live storage migration
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED CANTFIX QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: high    
Version: 3.1.0CC: abaron, bazulay, hateya, iheim, jkt, lpeer
Target Milestone: ---   
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-09 11:32:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1019461    
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-11-21 16:53:30 UTC
Created attachment 649305 [details]
logs

Description of problem:

live storage migration failed to remove a volume because of locks. 

the only information we are given is that an error CannotRemoveLogicalVolume

the user will find it very difficult to debug the issue if we do not assign error codes to the failures. 

Version-Release number of selected component (if applicable):

si24.4
vdsm-4.9.6-44.0.el6_3.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Run multiple live Storage migrations (run several vm's -> move disk for all vms)
2.
3.
  
Actual results:

we have several failure because of locks on volumes with unclear error on why we fail. 


Expected results:

we should add codes so that engine can log different failures in event log and help user to understand why we failed.

Additional info:logs

Thread-17947::ERROR::2012-11-21 17:05:25,021::task::853::TaskManager.Task::(_setError) Task=`0da0f387-b098-4891-8ecf-1a625478cc11`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1349, in deleteImage
    dom.deleteImage(sdUUID, imgUUID, volsByImg)
  File "/usr/share/vdsm/storage/blockSD.py", line 945, in deleteImage
    deleteVolumes(sdUUID, toDel)
  File "/usr/share/vdsm/storage/blockSD.py", line 177, in deleteVolumes
    lvm.removeLVs(sdUUID, vols)
  File "/usr/share/vdsm/storage/lvm.py", line 1010, in removeLVs
    raise se.CannotRemoveLogicalVolume(vgName, str(lvNames))
CannotRemoveLogicalVolume: Cannot remove Logical Volume: ('d40978c8-3fab-483b-b786-2f1e1c5cf130', "('49fb4b52-c355-4274-8ed0-fff75b236997', '17879e10-6ea9-4c32-bd7a-9b1bd71ff3e
e')")
Thread-17947::DEBUG::2012-11-21 17:05:25,023::task::872::TaskManager.Task::(_run) Task=`0da0f387-b098-4891-8ecf-1a625478cc11`::Task._run: 0da0f387-b098-4891-8ecf-1a625478cc11 (
'd40978c8-3fab-483b-b786-2f1e1c5cf130', 'edf0ee04-0cc2-4e13-877d-1e89541aea55', '963ec286-65fa-474a-8ea7-4c83622470f0', 'false', 'false') {} failed - stopping task

Comment 1 Federico Simoncelli 2013-07-09 11:32:27 UTC
There's nothing to do here to protect against this.

We see an exception only because the VM was running on the SPM host. If the two were on different hosts the call would have been successful and the VM would have been running on an LV that was removed (severe issue).

The only actor able to detect these mistakes is backend.

VDSM reported as much information as it could, "Cannot remove Logical Volume" (because in use).

Only thing that we could do here is an ugly and racy check to verify that the LV is not in use and report the same error (Cannot remove Logical Volume).
Anyway VDSM is not aware of the flow that the engine is conducting (it just sees a deleteImage request) and it cannot report any additional information.