Bug 1371111
Summary: | update dwh heartbeat error message to alert only after it did not update for a minute | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine-dwh | Reporter: | Shirly Radco <sradco> | ||||
Component: | ETL | Assignee: | Shirly Radco <sradco> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Lukas Svaty <lsvaty> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | --- | CC: | bugs, jbryant, kshukla, mburman, mgoldboi, mperina, mtessun, shipatil, sradco, trefex, usurse, ylavi | ||||
Target Milestone: | ovirt-4.1.1-1 | Flags: | rule-engine:
ovirt-4.1+
rule-engine: exception+ |
||||
Target Release: | 4.1.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause:
In some environments, the audit log was filled with dwh errors, due to dwh heartbeat not updating in the required interval.
Consequence:
Many log errors in admin portal audit log.
Fix:
We now send the error to the audit log only if the dwh was unable to sample data for at least a minute, due do dwh heartbeat not updating.
Result:
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1373456 1430666 (view as bug list) | Environment: | |||||
Last Closed: | 2017-04-21 09:37:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Metrics | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1373456, 1390631, 1430666 | ||||||
Attachments: |
|
Description
Shirly Radco
2016-08-29 11:03:57 UTC
Can we have engine log with DEBUG messages attached so we can check what part of code is responsible for that I added DEBUG messages to figure out what's going on in patch https://gerrit.ovirt.org/#/c/64139/ Created attachment 1225785 [details]
engine.log
Please use this link due to file size. engine.log: https://drive.google.com/open?id=0B8qzHycX6vljVlg5dVYzMHVGMkk *** Bug 1425868 has been marked as a duplicate of this bug. *** I have changed the title to reflect the upcoming changes. This fix updates the error message sent to the audit log to be sent only if the heartbeat did not update at least a minute from the last sampling. The error messages are still sent each time to the dwh log, since it means that it missed a sampling. *** Bug 1425868 has been marked as a duplicate of this bug. *** *** Bug 1433101 has been marked as a duplicate of this bug. *** verified in ovirt-engine-dwh-4.1.1-1.el7ev.noarch I was not able to see such message in numerous setups, if you will encounter this message again please reopen this bug, and we should consider either expanding the timeout or adjusting it based on environment |