Description of problem:Message status returned from emsrefresh of RHEVM always shows a status of 'ok' even when refresh encounters errors. Version-Release number of selected component (if applicable):5.5.2.4 How reproducible: Steps to Reproduce: 1. Register a RHEVM provider into CFME 5.5.2.4 appliance 2. Manually initiate a full refresh and kill the session from the evmrefresh worker to the RHEVM appliance using targeted kill or alter the network latency/delay to be so long that the tcp session used to collect data timesout. 3. examine the evm.log file to see what the message status value is. Actual results: the results will show 'ok' Expected results: message status to be 'error' or 'timedout' Additional info: customer logs from multiple RHEVM process showing multiple refreshes are included as well as spreadsheet showing refresh start times and pid, with refresh durations ranging from 2 minutes to > one hour for the same RHEVM instance
Created attachment 1137882 [details] one page pdf detailing multiple ems refreshes from same 3002 PID with their exeuction times. Only the 3400+ second ems refresh actually processed all expected data.
Created attachment 1137883 [details] distilled evm.log lines from process 3002 containing ems refreshes identified in pdf already provided
Dave, can QE test this scenario with a couple different providers? I would like to determine if this is a RHEVM specific problem. Or, if it's across the board for all providers. Marcel, assigning to you for now until we get more information from QE on whether this is an across the board problem with reporting an error on EMS Refresh, or if it's a RHEVM specific problem.
@Thomas Hennessy could you explain, where you would expect an error to be shown? In the UI there should be an error displayed for that specific EMS. Also the log shows and ERROR for that specific EMS. From what I understand, you a referring to the log file, where the "State: [ok]" appears. But this refers to the Queue Message being delivered and executed.
Marcel, IMHO the existence of the log line message containing the completed for any message ID indicates that the message number was delivered and completed. the status line should indicate if the requested command was completed successfully or not. If the log lines between the initiation of the message (ie message dequeued) and the message completion log line contains any errors that indicates to me that the initiated command encountered error and therefore the status of the execution of the requested command cannot possibly be 'ok'. Tom Hennessy
ok, thanks for the clarification. Just to be 100% sure: you want this [----] I, [2016-03-16T10:18:06.067938 #3002:90798c] INFO -- : MIQ(MiqQueue#delivered) Message id: [2000001124178], State: [ok], Delivered in [157.827267344] seconds to be changed into [----] I, [2016-03-16T10:18:06.067938 #3002:90798c] INFO -- : MIQ(MiqQueue#delivered) Message id: [2000001124178], State: [error], Delivered in [157.827267344] seconds
Marcel, yes. more generally if the command being initiated by the message id (Command: [EmsRefresh.refresh] in this case) encounters errors in the execution of the command (as is reported by the error messages in the evm.log) it seems to me any any reasonable customer would expect that the status returned would reflect that highest level of error encountered. In this case the 'error' as reported by multipe log lines. Tom Hennessy
on dev: https://github.com/ManageIQ/manageiq/pull/9907