Bug 878970
| Summary: | [TEXT][Live Storage Migration]: unclear logs for failure on live storage migration | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||
| Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> | ||||
| Status: | CLOSED CANTFIX | QA Contact: | Dafna Ron <dron> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 3.1.0 | CC: | abaron, bazulay, hateya, iheim, jkt, lpeer | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.3.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-07-09 11:32:27 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1019461 | ||||||
| Attachments: |
|
||||||
There's nothing to do here to protect against this. We see an exception only because the VM was running on the SPM host. If the two were on different hosts the call would have been successful and the VM would have been running on an LV that was removed (severe issue). The only actor able to detect these mistakes is backend. VDSM reported as much information as it could, "Cannot remove Logical Volume" (because in use). Only thing that we could do here is an ugly and racy check to verify that the LV is not in use and report the same error (Cannot remove Logical Volume). Anyway VDSM is not aware of the flow that the engine is conducting (it just sees a deleteImage request) and it cannot report any additional information. |
Created attachment 649305 [details] logs Description of problem: live storage migration failed to remove a volume because of locks. the only information we are given is that an error CannotRemoveLogicalVolume the user will find it very difficult to debug the issue if we do not assign error codes to the failures. Version-Release number of selected component (if applicable): si24.4 vdsm-4.9.6-44.0.el6_3.x86_64 How reproducible: 100% Steps to Reproduce: 1. Run multiple live Storage migrations (run several vm's -> move disk for all vms) 2. 3. Actual results: we have several failure because of locks on volumes with unclear error on why we fail. Expected results: we should add codes so that engine can log different failures in event log and help user to understand why we failed. Additional info:logs Thread-17947::ERROR::2012-11-21 17:05:25,021::task::853::TaskManager.Task::(_setError) Task=`0da0f387-b098-4891-8ecf-1a625478cc11`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 1349, in deleteImage dom.deleteImage(sdUUID, imgUUID, volsByImg) File "/usr/share/vdsm/storage/blockSD.py", line 945, in deleteImage deleteVolumes(sdUUID, toDel) File "/usr/share/vdsm/storage/blockSD.py", line 177, in deleteVolumes lvm.removeLVs(sdUUID, vols) File "/usr/share/vdsm/storage/lvm.py", line 1010, in removeLVs raise se.CannotRemoveLogicalVolume(vgName, str(lvNames)) CannotRemoveLogicalVolume: Cannot remove Logical Volume: ('d40978c8-3fab-483b-b786-2f1e1c5cf130', "('49fb4b52-c355-4274-8ed0-fff75b236997', '17879e10-6ea9-4c32-bd7a-9b1bd71ff3e e')") Thread-17947::DEBUG::2012-11-21 17:05:25,023::task::872::TaskManager.Task::(_run) Task=`0da0f387-b098-4891-8ecf-1a625478cc11`::Task._run: 0da0f387-b098-4891-8ecf-1a625478cc11 ( 'd40978c8-3fab-483b-b786-2f1e1c5cf130', 'edf0ee04-0cc2-4e13-877d-1e89541aea55', '963ec286-65fa-474a-8ea7-4c83622470f0', 'false', 'false') {} failed - stopping task