Bug 1460749
Summary: | Data loss of logs can occur if fluentd pod is terminated/restarted when Elasticsearch is unavailable | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Peter Portante <pportant> | |
Component: | Logging | Assignee: | Noriko Hosoi <nhosoi> | |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 3.4.1 | CC: | aos-bugs, jcantril, nhosoi, pportant, pweil, rmeggins, rromerom | |
Target Milestone: | --- | |||
Target Release: | 3.7.0 | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: Messages are read into fluentds memory buffer and are lost if the pod is restarted because fluentd considers them read but they have not been pushed to storage
Consequence: Any message not stored but already read by fluentd is lost
Fix: Replace the memory buffer with a file based buffer
Result: File buffered messages are pushed to storage once fluentd restarts
|
Story Points: | --- | |
Clone Of: | ||||
: | 1477513 1477515 1483114 (view as bug list) | Environment: | ||
Last Closed: | 2017-11-28 21:56:55 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1491947 | |||
Bug Blocks: | 1477513, 1477515, 1483114 |
Description
Peter Portante
2017-06-12 15:08:41 UTC
Moving this to urgent as this is a blocker for 3.6, and it is critical for mux since there is no on-disk source of logs to recover from. Noriko, did file buffer get in for 3.6? If so, please mark this bug as MODIFIED and include the PRs for openshift-ansible and origin-aggregated-logging, for the release-3.6 branch. (In reply to Rich Megginson from comment #4) > Noriko, did file buffer get in for 3.6? If so, please mark this bug as > MODIFIED and include the PRs for openshift-ansible and > origin-aggregated-logging, for the release-3.6 branch. No Merge has not happened to the both master and release-3.6 branch... https://github.com/openshift/origin-aggregated-logging/pull/556 -- master https://github.com/openshift/origin-aggregated-logging/pull/559 -- release-3.6 I notieced the pull requests have no flags like these. component/fluentd priority/P0 release/3.[67] I should have set them? If so, could you tell me how? (In reply to Noriko Hosoi from comment #5) > (In reply to Rich Megginson from comment #4) > > Noriko, did file buffer get in for 3.6? If so, please mark this bug as > > MODIFIED and include the PRs for openshift-ansible and > > origin-aggregated-logging, for the release-3.6 branch. > > No Merge has not happened to the both master and release-3.6 branch... > > https://github.com/openshift/origin-aggregated-logging/pull/556 -- master > https://github.com/openshift/origin-aggregated-logging/pull/559 -- > release-3.6 > > I notieced the pull requests have no flags like these. > component/fluentd priority/P0 release/3.[67] > I should have set them? If so, could you tell me how? The flags aren't really necessary, they are just helpful when looking at the list of PRs to know at a glance what the PR is all about. Once the 3.6 branch opens for 3.6.1 PRs, we'll get this merged. The bug verification work is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1482002 reassign to @juzhao as he is the trello card owner Verification steps: 1. Use mux to test, set the following parameters in inventory file openshift_logging_use_mux=true openshift_logging_mux_client_mode=maximal 2. Creat one project to populate logs. 3. Stop fluentd pods, and note down the last project logs in kibana 4. Wait for a while, and restart fluentd pods. 5. Check the subsequent logs after step 3, no logs is missing. 6. Repeat step 3 to 5, make sure no log is missing. Test env # openshift version openshift v3.7.0-0.126.4 kubernetes v1.7.0+80709908fd etcd 3.2.1 Images: logging-curator-v3.7.0-0.126.4.0 logging-elasticsearch-v3.7.0-0.126.4.0 logging-fluentd-v3.6.173.0.28-1 logging-kibana-v3.7.0-0.126.4.0 logging-auth-proxy-v3.7.0-0.126.4.0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |