Bug 1722380
Summary: | Logging data from all projects are stored to .orphaned indexes with Elasticsearch | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Radomir Ludva <rludva> | |
Component: | Logging | Assignee: | Rich Megginson <rmeggins> | |
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.11.0 | CC: | anli, aos-bugs, gabriela, jcantril, rmeggins | |
Target Milestone: | --- | |||
Target Release: | 3.11.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openshift3/ose-logging-fluentd:v3.11.130-1 | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Fluentd is unable to correctly determine the docker log
driver. It thinks the log driver is journald when it is json-file.
Fluentd then looks for the `CONTAINER_NAME` field in the record to
hold the kubernetes metadata and it is not present.
Consequence: Fluentd is not able to add kubernetes metadata to
records. Records go to the .orphaned index. Fluentd spews lots
of errors like this:
[error]: record cannot use elasticsearch index na me type project_full: record is missing kubernetes field
Fix: Fluentd should not rely on reading the docker configuration file
to determine if the record contains kubernetes metadata. It should
look at both the record tag and the record data and use whatever
kubernetes metadata it finds there.
Result: Fluentd can correctly add kubernetes metadata and assign
records to the correct indices no matter which log driver docker
is using.
Records read from files under /var/log/containers/*.log will have
a fluentd tag like kubernetes.var.log.containers.**. This applies
both to CRI-O and docker file logs. Kubernetes records read from
journald with CONTAINER_NAME will have a tag like
journal.kubernetes.**. There is no CRI-O journald log driver yet,
and it is not clear how those records will be represented, but
hopefully they will follow the same CONTAINER_NAME convention, in
which case they will Just Work.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1722898 1724263 (view as bug list) | Environment: | ||
Last Closed: | 2019-08-13 14:09:19 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1722898, 1724263 |
Description
Radomir Ludva
2019-06-20 08:14:17 UTC
Can you please describe when this occurred? Was this after an upgrade? Reviewing the logs I see there is a point where fluent is starting an unable to contact Elasticsearch. This is indicative of an upgrade or logging start scenario. If fluent is unable to contact the kube API server in order to fetch metadata it will push the logs to the 'orphaned' index. Many times this could be from pods and/or namespaces which no longer exist and it is unable to retrieve meta data at all. If you start a new pod now the logging stack is running are you still experiencing this issue? Could be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1711596#c8 The cause there is that fluentd could not correctly determine which logging driver was being used for container logs by looking at the docker configuration file. Please try this: oc set env ds/logging-fluentd DEBUG=true VERBOSE=true This will restart all of your fluentd pods with tracing so we can see what it is doing. Also, please provide your /etc/docker/daemon.json and /etc/sysconfig/docker from one of your nodes where fluentd is running. merged upstream https://github.com/openshift/origin-aggregated-logging/commit/396764296721ca67a73799357ca2451d484f16dc *** Bug 1711596 has been marked as a duplicate of this bug. *** Needs rubygem-fluent-plugin-kubernetes_metadata_filter-1.2.1-1.el7 - this is built and tagged into rhaos-3.11-rhel-7-candidate NOTE: This rpm cannot be tagged into 3.10 and earlier. It requires that the fluentd config is using the separate merge json log parser. A customer that needs this particular fix will have to upgrade to 3.11. Next step: need a 3.11 compose built with this package, then logging-fluentd 3.11 image built with this rpm ART says 3.11 compose rebuild will be in about a week from now the fix is in openshift3/ose-logging-fluentd:v3.11.130-1 or later - https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=935020 The fix and the gem are in openshift3/ose-logging-fluentd:v3.11.135. The journald container logs are parsed automatically without USE_JOURNAL=true. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2352 |