Created attachment 706137 [details] logs Description of problem: When VDSM DeleteImage command is failing, VDSM updates the engine using task polling about the failure that task was ended with failure 100 but engine ignores it and removes the disks from database, which creates sync loss between engine and hypervisor, in my case, i wondered why my storage still consumes all its sapce altough i deleted all my VMs and disks. Version-Release number of selected component (if applicable): rhevm-backend-3.2.0-10.10.beta1.el6ev.noarch vdsm-4.10.2-10.0.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. Have VM with 1 or more disks. 2. remove the disks 3. lvremove command by VDSM will fail and VDSM will answer engine with 'code 100'. (related to bug #918469) Actual results: engine will ignore the 'code 100' answer by VDSM and will delete the disks from its database. Expected results: engine should relate to VDSM 'code 100' massage and: 1) not delete the disks from its database 2) notice the user about VDSM failed DeleteImage command. Additional info: see attatched logs
One would expect from VDSM to return something better than 100 (which AFAIK, is "General Exception").
The vdsm log attached is irrelevant (I'm guessing it rolled). Please attach proper log. Wrt General Exception, that is what vdsm returns when an unexpected error is thrown in dispatcher.py Why that doesn't contain the message thrown in the unhandled exception I do not know, but that is not storage related. Wrt the specific error that is not handled in this case I cannot say without logs.
Created attachment 709929 [details] vdsm log
There is no 'lvremove' command in the attached log. Again probably wrong log?
Created attachment 714004 [details] vdsm log
Created attachment 714005 [details] engine log
The vdsm.log is still the wrong log. It does not contain the General Exception. There is an lvremove which failed due to storage issues, but by design that does not cause the operation to fail (as can be seen by the return value of the command). The referenced bug 918469 has been fixed already. If you see this again reopen.
Created attachment 715886 [details] engine log. attaching the correct engine log. please refer to: 2013-03-06 17:02:46,306 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-37) [4f5962cb] Error code GeneralException and error message VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = 'module' object has no attribute 'CMD_LOWPRIO' disks are removed from data-base and not removed from hypervisor.
(In reply to comment #9) > Created attachment 715886 [details] > engine log. > > attaching the correct engine log. > > please refer to: > > 2013-03-06 17:02:46,306 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] > (QuartzScheduler_Worker-37) [4f5962cb] Error code GeneralException and error > message > VDSGenericException: VDSErrorException: Failed to > HSMGetAllTasksStatusesVDS, error = 'module' object has no attribute > 'CMD_LOWPRIO' As noted above, this problem was fixed in bug 918469 *** This bug has been marked as a duplicate of bug 918469 ***