+++ This bug was initially created as a clone of Bug #960635 +++ Description of problem: I ran a vm on hsm host and restarted the vdsm on the hsm during the create volume step. the live storage migration command fails and we report it 5 (not counting the INFO which also reports it) (see more info) Version-Release number of selected component (if applicable): sf15 vdsm-4.10.2-17.0.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. create and run a vm on iscsi pool with 2 hosts 2. live migrate the vm's disk 3. restart the vdsm on the hsm when task createVolume appears in spm Actual results: we fail the move and engine reports it 5 times in the log which would make debugging deficult in case we have multiple vms Expected results: we should diminish the amount of reported errors in the log Additional info: logs 2013-05-07 16:14:10,043 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-46) [105ad651] Error code imageErr and error message VDSGenericException: VDSErrorException: Failed to VmReplicateDiskStartVDS, e rror = Drive image file %s could not be found 2013-05-07 16:14:10,044 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-46) [105ad651] Command VmReplicateDiskStartVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskStartVDS, error = Drive image file %s could not be found 2013-05-07 16:14:10,044 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-46) [105ad651] Command org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand throw Vdc Bll exception. With error message VdcBLLException: Drive image file %s could not be found 2013-05-07 16:14:10,070 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-46) [105ad651] Error code imageErr and error message VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Drive image file %s could not be found 2013-05-07 16:14:10,070 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-46) [105ad651] Command VmReplicateDiskFinishVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Drive image file %s could not be found --- Additional comment from Alissa on 2013-07-09 08:04:05 EDT --- First problem that needs a fix related to the specific error that the error doesn't contain the name of the image file, it's not properly replaced and will make troubleshooting more difficult: "Drive image file %s could not be found". Second - regarding the multiple logging, it looks as this is not a storage related problem, but rather happens for many different commands. Perhaps the bug should be splitted to handle the first issue in storage area, and the multiple logging to be handled on infra level - Ayal, your input on this? The multiple logging is done in the following places: BrokerCommandBase --> ProceedProxyReturnValue VDSCommandBase --> setVdsRuntimeError --- Additional comment from Ayal Baron on 2013-07-09 09:52:12 EDT --- (In reply to Alissa from comment #1) > First problem that needs a fix related to the specific error that the error > doesn't contain the name of the image file, it's not properly replaced and > will make troubleshooting more difficult: > "Drive image file %s could not be found". This means that we're lacking this information in the log? > > Second - regarding the multiple logging, it looks as this is not a storage > related problem, but rather happens for many different commands. > Perhaps the bug should be splitted to handle the first issue in storage > area, and the multiple logging to be handled on infra level - Ayal, your > input on this? > The multiple logging is done in the following places: > > BrokerCommandBase --> ProceedProxyReturnValue > VDSCommandBase --> setVdsRuntimeError Sounds reasonable to me. Please clone it, rename both to reflect the relevant problems etc. --- Additional comment from Alissa on 2013-07-09 10:43:06 EDT --- (In reply to Ayal Baron from comment #2) > (In reply to Alissa from comment #1) > > First problem that needs a fix related to the specific error that the error > > doesn't contain the name of the image file, it's not properly replaced and > > will make troubleshooting more difficult: > > "Drive image file %s could not be found". > > This means that we're lacking this information in the log? yes. I think the incomplete message is originated like this in vdsm so the bug should be moved from engine to vdsm component - this is how it's declared in define.py (and it's not defined like this in engine code): 'imageErr': {'status': { 'code': 13, 'message': 'Drive image file %s could not be found'}}, > > Second - regarding the multiple logging, it looks as this is not a storage > > related problem, but rather happens for many different commands. > > Perhaps the bug should be splitted to handle the first issue in storage > > area, and the multiple logging to be handled on infra level - Ayal, your > > input on this? > > The multiple logging is done in the following places: > > > > BrokerCommandBase --> ProceedProxyReturnValue > > VDSCommandBase --> setVdsRuntimeError > > Sounds reasonable to me. > Please clone it, rename both to reflect the relevant problems etc.
The multiple logging is done in the following places: BrokerCommandBase --> ProceedProxyReturnValue VDSCommandBase --> setVdsRuntimeError
Yaniv, don't try to reproduce with live storage migration. Just fix/understand the issue described in comment #1
Yaniv, following Comment #2 the log at BrokerCommandBase happens in the "default" part of the large switch. error code 13 is imageErr at VdcBllErrors. There are many other errors in VdcBllErrors which are handdled by the default part (so indeed you can use other flows to reproduce this). ProccedProxyReturnValue is being called from VmReplicateDiskStartVdsCommand and VmReplaicateDiskFinishVdsCommand - both extend the BrokerCommndBase. These VDS broker commands are called from the task handlers of LiveMigrateDisk, see LiveMigrateDiskCommand.initTaskHandlers() Since each one of these VDS commands also extend VdsCommandBase, you will see that setVdsRuntimeError(ex) is also being called (see VdsCommandBase.executeCommand One of the problems is that all the exceptions generated by ProceedProxyReturnValue extend (directly or indirectly) RuntimeException, and you can see what's going on in VDSCommandBase.executeCommand - the catch for RuntimeException performs the setVdsRuntimeError What i would suggest (of course other suggestions are posible :) is to wrap the exceptions generated at ProceedProxyReturnValue with Let's say a ProxyReturnValueException which extends RuntimeException, and then at the VdsCommandBase.executeCommand code, add a catch to ProxyReturnValueException which does not log (in contrast to catch RuntimeException)
Yair, I can catch VDSExceptionBase and call VDSCommandBase::setVdsRuntimeError with a parameter not to report to log. Isn't that enough?
Yaniv, did you check this out? How will you determine the value of the boolean flag? Anyway, I'm postponing to 3.4
fixed in commit : afc4d50
ovirt 3.4.0 alpha has been released
Verified on Ovirt 3.4.0-0.7beta2, vdsm-4.14.1-3.el6.x86_64. engine.log doesn't have multiple error messages at the same time.
Closing as part of 3.4.0