Description of problem: Engine Heartbeat should update every 15 seconds, but in some cases it may take longer. If it takes longer than 20 seconds the dwh will alert "Can not sample data, oVirt Engine is not updating the statistics" . Version-Release number of selected component (if applicable): 4.0.2 How reproducible: Steps to Reproduce: 1.Try to load the engine machine with dwh installed. 2. 3. Actual results: Will get multiple "Can not sample data, oVirt Engine is not updating the statistics" errors in the log. Expected results: Should not alert each time. Should wait for about a minute before alerting. In order to allow the connection to restore and not load the user with errors. Additional info:
Can we have engine log with DEBUG messages attached so we can check what part of code is responsible for that I added DEBUG messages to figure out what's going on in patch https://gerrit.ovirt.org/#/c/64139/
Created attachment 1225785 [details] engine.log
Please use this link due to file size. engine.log: https://drive.google.com/open?id=0B8qzHycX6vljVlg5dVYzMHVGMkk
*** Bug 1425868 has been marked as a duplicate of this bug. ***
I have changed the title to reflect the upcoming changes.
This fix updates the error message sent to the audit log to be sent only if the heartbeat did not update at least a minute from the last sampling. The error messages are still sent each time to the dwh log, since it means that it missed a sampling.
*** Bug 1433101 has been marked as a duplicate of this bug. ***
verified in ovirt-engine-dwh-4.1.1-1.el7ev.noarch I was not able to see such message in numerous setups, if you will encounter this message again please reopen this bug, and we should consider either expanding the timeout or adjusting it based on environment