Bug 982686

Summary: engine [logger]: same exception is logged multiple times in engine log
Product: Red Hat Enterprise Virtualization Manager Reporter: Alissa <abonas>
Component: ovirt-engineAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: sefi litmanovich <slitmano>
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: abonas, acathrow, bazulay, bdagan, dron, emesika, iheim, jkt, lpeer, pstehlik, Rhev-m-bugs, yeylon, yzaslavs
Target Milestone: ---   
Target Release: 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: ovirt-3.4.0-alpha1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 960635 Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 960635    
Bug Blocks: 1078909, 1142926    

Description Alissa 2013-07-09 14:54:33 UTC
+++ This bug was initially created as a clone of Bug #960635 +++

Description of problem:

I ran a vm on hsm host and restarted the vdsm on the hsm during the create volume step.
the live storage migration command fails and we report it 5 (not counting the INFO which also reports it)

(see more info)

Version-Release number of selected component (if applicable):

sf15
vdsm-4.10.2-17.0.el6ev.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create and run a vm on iscsi pool with 2 hosts
2. live migrate the vm's disk
3. restart the vdsm on the hsm when task createVolume appears in spm
 
Actual results:

we fail the move and engine reports it 5 times in the log which would make debugging deficult in case we have multiple vms

Expected results:

we should diminish the amount of reported errors in the log


Additional info: logs

2013-05-07 16:14:10,043 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-46) [105ad651] Error code imageErr and error message VDSGenericException: VDSErrorException: Failed to VmReplicateDiskStartVDS, e
rror = Drive image file %s could not be found

2013-05-07 16:14:10,044 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-46) [105ad651] Command VmReplicateDiskStartVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed
to VmReplicateDiskStartVDS, error = Drive image file %s could not be found

2013-05-07 16:14:10,044 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-46) [105ad651] Command org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand throw Vdc Bll exception. With error message VdcBLLException:
 Drive image file %s could not be found

2013-05-07 16:14:10,070 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-46) [105ad651] Error code imageErr and error message VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS,
error = Drive image file %s could not be found

2013-05-07 16:14:10,070 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-46) [105ad651] Command VmReplicateDiskFinishVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed
 to VmReplicateDiskFinishVDS, error = Drive image file %s could not be found

--- Additional comment from Alissa on 2013-07-09 08:04:05 EDT ---

First problem that needs a fix related to the specific error that the error doesn't contain the name of the image file, it's not properly replaced and will make troubleshooting more difficult: 
"Drive image file %s could not be found".

Second - regarding the multiple logging, it looks as this is not a storage related problem, but rather happens for many different commands.
Perhaps the bug should be splitted to handle the first issue in storage area, and the multiple logging to be handled on infra level - Ayal, your input on this?
The multiple logging is done in the following places:

BrokerCommandBase --> ProceedProxyReturnValue
VDSCommandBase --> setVdsRuntimeError

--- Additional comment from Ayal Baron on 2013-07-09 09:52:12 EDT ---

(In reply to Alissa from comment #1)
> First problem that needs a fix related to the specific error that the error
> doesn't contain the name of the image file, it's not properly replaced and
> will make troubleshooting more difficult: 
> "Drive image file %s could not be found".

This means that we're lacking this information in the log?

> 
> Second - regarding the multiple logging, it looks as this is not a storage
> related problem, but rather happens for many different commands.
> Perhaps the bug should be splitted to handle the first issue in storage
> area, and the multiple logging to be handled on infra level - Ayal, your
> input on this?
> The multiple logging is done in the following places:
> 
> BrokerCommandBase --> ProceedProxyReturnValue
> VDSCommandBase --> setVdsRuntimeError

Sounds reasonable to me.
Please clone it, rename both to reflect the relevant problems etc.

--- Additional comment from Alissa on 2013-07-09 10:43:06 EDT ---

(In reply to Ayal Baron from comment #2)
> (In reply to Alissa from comment #1)
> > First problem that needs a fix related to the specific error that the error
> > doesn't contain the name of the image file, it's not properly replaced and
> > will make troubleshooting more difficult: 
> > "Drive image file %s could not be found".
> 
> This means that we're lacking this information in the log?
yes. I think the incomplete message is originated like this in vdsm so the bug should be moved from engine to vdsm component - this is how it's declared in define.py (and it's not defined like this in engine code):
  'imageErr': {'status': {
        'code': 13,
        'message': 'Drive image file %s could not be found'}},

> > Second - regarding the multiple logging, it looks as this is not a storage
> > related problem, but rather happens for many different commands.
> > Perhaps the bug should be splitted to handle the first issue in storage
> > area, and the multiple logging to be handled on infra level - Ayal, your
> > input on this?
> > The multiple logging is done in the following places:
> > 
> > BrokerCommandBase --> ProceedProxyReturnValue
> > VDSCommandBase --> setVdsRuntimeError
> 
> Sounds reasonable to me.
> Please clone it, rename both to reflect the relevant problems etc.

Comment 1 Alissa 2013-07-09 14:56:59 UTC
The multiple logging is done in the following places:

BrokerCommandBase --> ProceedProxyReturnValue
VDSCommandBase --> setVdsRuntimeError

Comment 2 Barak 2013-08-13 11:56:50 UTC
Yaniv, don't try to reproduce with live storage migration.
Just fix/understand the issue described in comment #1

Comment 3 Yair Zaslavsky 2013-08-15 07:15:39 UTC
Yaniv,  following Comment #2 the log at BrokerCommandBase happens in the "default" part of the large switch.
error code 13 is imageErr at VdcBllErrors.
There are many other errors in VdcBllErrors which are handdled by the default part (so indeed you can use other flows to reproduce this).

ProccedProxyReturnValue is being called from VmReplicateDiskStartVdsCommand and
VmReplaicateDiskFinishVdsCommand  - both extend the BrokerCommndBase.
These VDS broker commands are called from the task handlers of LiveMigrateDisk, see
LiveMigrateDiskCommand.initTaskHandlers()
Since each one of these VDS commands also extend VdsCommandBase, you will see that setVdsRuntimeError(ex) is also being called (see VdsCommandBase.executeCommand

One of the problems is that all the exceptions generated by ProceedProxyReturnValue extend (directly or indirectly) RuntimeException,
and you can see what's going on in VDSCommandBase.executeCommand  - the catch for RuntimeException performs the setVdsRuntimeError

What i would suggest (of course other suggestions are posible :) is to wrap
the exceptions generated at ProceedProxyReturnValue with Let's say a ProxyReturnValueException which extends RuntimeException, and then at the VdsCommandBase.executeCommand code, add a catch to ProxyReturnValueException which does not log (in contrast to catch RuntimeException)

Comment 4 Yaniv Bronhaim 2013-09-02 13:36:35 UTC
Yair, I can catch VDSExceptionBase and call VDSCommandBase::setVdsRuntimeError with a parameter not to report to log. Isn't that enough?

Comment 5 Yair Zaslavsky 2013-09-16 04:27:46 UTC
Yaniv, did you check this out? How will you determine the value of the boolean flag?
Anyway, I'm postponing to 3.4

Comment 6 Eli Mesika 2013-11-20 09:53:43 UTC
fixed in commit : afc4d50

Comment 7 Sandro Bonazzola 2014-01-14 08:44:43 UTC
ovirt 3.4.0 alpha has been released

Comment 8 sefi litmanovich 2014-02-17 13:22:35 UTC
Verified on Ovirt 3.4.0-0.7beta2, vdsm-4.14.1-3.el6.x86_64.

engine.log doesn't have multiple error messages at the same time.

Comment 10 Itamar Heim 2014-06-12 14:07:21 UTC
Closing as part of 3.4.0