Bug 1532955
| Summary: | Container logs was not sent to ES stack | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||||
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 3.9.0 | CC: | aos-bugs, juzhao, rmeggins, wsun | ||||||
| Target Milestone: | --- | Keywords: | TestBlocker | ||||||
| Target Release: | 3.9.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: |
Cause: Metadata pipeline was relying on missing info
Consequence: Required information caused the record pocessing to error
Fix: Update the pipeline to better cache and fall back to pushing the log into an orphaned index if needed
Result: Logs pushed into storage as desired.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-03-28 14:18:26 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Anping Li
2018-01-10 05:26:30 UTC
Created attachment 1379359 [details]
Log dump files
Created attachment 1379361 [details]
fluentd log
Two many fluent logs, I only attached one from the openshift master.
Container logs collection testing are blocked. For the containers log can be found in Openshift:v3.9.0-0.19.0 when using rpm installation, so move bug to medium Severity. I will continue to check if the issue exist in containerized installation. Still no the project index in ES when use logging-fluentd/v3.9.0-0.20.0.0 in one cluster. There are two many fluentd logs. Will try again when https://bugzilla.redhat.com/show_bug.cgi?id=1531157 is fixed. After running for a few minutes, there are so many es pods which status is Evicted, describe one es pod, and find the events: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Evicted 4m kubelet, qe-juzhao-39-gcs-1-nrr-1 The node was low on resource: [DiskPressure]. # oc get po | grep logging-es logging-es-data-master-ea1bmc86-1-5tnqs 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-78rvl 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-8v22l 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-b57vf 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-djdhq 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-fd8bb 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-gx48r 0/2 Evicted 0 44m logging-es-data-master-ea1bmc86-1-h9ksk 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-jbvn2 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-jfddt 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-jfxs7 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-lm25j 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-m7fds 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-mvrwz 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-pmmfs 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-pt78j 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-sc66g 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-tdm8w 0/2 Pending 0 4m logging-es-data-master-ea1bmc86-1-zdbpv 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-zwhq4 0/2 Evicted 0 4m The project indices can be found with the following workaround. 1. delete /var/log/es-containers.log.pos 2. modify daemonset to v3.7 fluentd 3. modify daemonset back to v3.9 fluentd (In reply to Junqi Zhao from comment #6) > After running for a few minutes, there are so many es pods which status is > Evicted, describe one es pod, and find the events: > Events: > Type Reason Age From Message > ---- ------ ---- ---- ------- > Warning Evicted 4m kubelet, qe-juzhao-39-gcs-1-nrr-1 The node was > low on resource: [DiskPressure]. If you deploy logging on a system with enough disk space, does it fix this bug? > > # oc get po | grep logging-es > logging-es-data-master-ea1bmc86-1-5tnqs 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-78rvl 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-8v22l 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-b57vf 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-djdhq 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-fd8bb 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-gx48r 0/2 Evicted 0 44m > logging-es-data-master-ea1bmc86-1-h9ksk 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-jbvn2 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-jfddt 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-jfxs7 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-lm25j 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-m7fds 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-mvrwz 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-pmmfs 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-pt78j 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-sc66g 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-tdm8w 0/2 Pending 0 4m > logging-es-data-master-ea1bmc86-1-zdbpv 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-zwhq4 0/2 Evicted 0 4m (In reply to Rich Megginson from comment #8) > (In reply to Junqi Zhao from comment #6) > > After running for a few minutes, there are so many es pods which status is > > Evicted, describe one es pod, and find the events: > > Events: > > Type Reason Age From Message > > ---- ------ ---- ---- ------- > > Warning Evicted 4m kubelet, qe-juzhao-39-gcs-1-nrr-1 The node was > > low on resource: [DiskPressure]. > > If you deploy logging on a system with enough disk space, does it fix this > bug? It is not reproduced every time, I did not find this issue today, so not sure about your question Moving to Modified with the merge of https://github.com/openshift/origin-aggregated-logging/pull/898 The fix isn't in 3.9.0-0.22.0.0 waiting for the next build The fix wasn't merged into logging-fluentd/images/v3.9.0-0.23.0.0 sh-4.2# gem list |grep fluent-plugin-kubernetes_metadata_filter fluent-plugin-kubernetes_metadata_filter (0.33.0) (In reply to Rich Megginson from comment #8) > (In reply to Junqi Zhao from comment #6) > > After running for a few minutes, there are so many es pods which status is > > Evicted, describe one es pod, and find the events: > > Events: > > Type Reason Age From Message > > ---- ------ ---- ---- ------- > > Warning Evicted 4m kubelet, qe-juzhao-39-gcs-1-nrr-1 The node was > > low on resource: [DiskPressure]. > > If you deploy logging on a system with enough disk space, does it fix this > bug? Reproduced today, maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=1531157 The bug is fixed on logging-fluentd:v3.9.0-0.24.0.0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489 |