Bug 1722380 - Logging data from all projects are stored to .orphaned indexes with Elasticsearch
Summary: Logging data from all projects are stored to .orphaned indexes with Elasticse...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.11.z
Assignee: Rich Megginson
QA Contact: Anping Li
URL:
Whiteboard:
: 1711596 (view as bug list)
Depends On:
Blocks: 1722898 1724263
TreeView+ depends on / blocked
 
Reported: 2019-06-20 08:14 UTC by Radomir Ludva
Modified: 2019-08-13 14:09 UTC (History)
5 users (show)

Fixed In Version: openshift3/ose-logging-fluentd:v3.11.130-1
Doc Type: Bug Fix
Doc Text:
Cause: Fluentd is unable to correctly determine the docker log driver. It thinks the log driver is journald when it is json-file. Fluentd then looks for the `CONTAINER_NAME` field in the record to hold the kubernetes metadata and it is not present. Consequence: Fluentd is not able to add kubernetes metadata to records. Records go to the .orphaned index. Fluentd spews lots of errors like this: [error]: record cannot use elasticsearch index na me type project_full: record is missing kubernetes field Fix: Fluentd should not rely on reading the docker configuration file to determine if the record contains kubernetes metadata. It should look at both the record tag and the record data and use whatever kubernetes metadata it finds there. Result: Fluentd can correctly add kubernetes metadata and assign records to the correct indices no matter which log driver docker is using. Records read from files under /var/log/containers/*.log will have a fluentd tag like kubernetes.var.log.containers.**. This applies both to CRI-O and docker file logs. Kubernetes records read from journald with CONTAINER_NAME will have a tag like journal.kubernetes.**. There is no CRI-O journald log driver yet, and it is not clear how those records will be represented, but hopefully they will follow the same CONTAINER_NAME convention, in which case they will Just Work.
Clone Of:
: 1722898 1724263 (view as bug list)
Environment:
Last Closed: 2019-08-13 14:09:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin-aggregated-logging pull 1680 0 'None' closed Bug 1722380: Logging data from all projects are stored to .orphaned indexes with Elasticsearch 2020-10-26 13:57:01 UTC
Red Hat Product Errata RHBA-2019:2352 0 None None None 2019-08-13 14:09:30 UTC

Description Radomir Ludva 2019-06-20 08:14:17 UTC
Description of problem:
All logs from all projects are transferred to Orphaned within Elasticsearch. Elasticsearch is using only Operation indices and Orphaned for logs.

Version-Release number of selected component (if applicable):
redhat-release-server-7.6-4.el7.x86_64
atomic-openshift-3.11.98-1.git.0.0cbaff3.el7.x86_64
atomic-openshift-clients-3.11.98-1.git.0.0cbaff3.el7.x86_64
atomic-openshift-docker-excluder-3.11.98-1.git.0.0cbaff3.el7.noarch
atomic-openshift-excluder-3.11.98-1.git.0.0cbaff3.el7.noarch
atomic-openshift-hyperkube-3.11.98-1.git.0.0cbaff3.el7.x86_64
atomic-openshift-node-3.11.98-1.git.0.0cbaff3.el7.x86_64
atomic-registries-1.22.1-26.gitb507039.el7.x86_64
docker-1.13.1-94.gitb2f74b2.el7.x86_64
docker-client-1.13.1-94.gitb2f74b2.el7.x86_64
docker-common-1.13.1-94.gitb2f74b2.el7.x86_64
docker-rhel-push-plugin-1.13.1-94.gitb2f74b2.el7.x86_64

How reproducible:
I am not able to reproduce this issue.

Actual results:
Logs are stored in Orphaned indices.

Expected results:
Logs from projects are stored in *.Project indices.


Additional info:
We are not sure if this is a configuration issue or if we hit a bug. But logs from Elasticsearch:
---
Clustername: logging-es
Clusterstate: GREEN
Number of nodes: 1
Number of data nodes: 1
.searchguard index does not exists, attempt to create it ... done (0-all replicas)
Populate config from /opt/app-root/src/sgconfig/
Will update 'config' with /opt/app-root/src/sgconfig/sg_config.yml
   SUCC: Configuration for 'config' created or updated
Will update 'roles' with /opt/app-root/src/sgconfig/sg_roles.yml
   SUCC: Configuration for 'roles' created or updated
Will update 'rolesmapping' with /opt/app-root/src/sgconfig/sg_roles_mapping.yml
   SUCC: Configuration for 'rolesmapping' created or updated
Will update 'internalusers' with /opt/app-root/src/sgconfig/sg_internal_users.yml
   SUCC: Configuration for 'internalusers' created or updated
Will update 'actiongroups' with /opt/app-root/src/sgconfig/sg_action_groups.yml
   SUCC: Configuration for 'actiongroups' created or updated
Done with success
--

Comment 3 Jeff Cantrill 2019-06-20 14:26:50 UTC
Can you please describe when this occurred?  Was this after an upgrade?  Reviewing the logs I see there is a point where fluent is starting an unable to contact Elasticsearch. This is indicative of an upgrade or logging start scenario.  If fluent is unable to contact the kube API server in order to fetch metadata it will push the logs to the 'orphaned' index.  Many times this could be from pods and/or namespaces which no longer exist and it is unable to retrieve meta data at all.

If you start a new pod now the logging stack is running are you still experiencing this issue?

Comment 4 Rich Megginson 2019-06-20 17:10:42 UTC
Could be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1711596#c8

The cause there is that fluentd could not correctly determine which logging driver was being used for container logs by looking at the docker configuration file.

Comment 5 Rich Megginson 2019-06-20 17:12:26 UTC
Please try this:

oc set env ds/logging-fluentd DEBUG=true VERBOSE=true

This will restart all of your fluentd pods with tracing so we can see what it is doing.

Also, please provide your /etc/docker/daemon.json and /etc/sysconfig/docker from one of your nodes where fluentd is running.

Comment 11 Rich Megginson 2019-06-26 15:35:35 UTC
*** Bug 1711596 has been marked as a duplicate of this bug. ***

Comment 15 Rich Megginson 2019-07-12 19:47:08 UTC
Needs rubygem-fluent-plugin-kubernetes_metadata_filter-1.2.1-1.el7 - this is built and tagged into rhaos-3.11-rhel-7-candidate

NOTE: This rpm cannot be tagged into 3.10 and earlier.  It requires that the fluentd config is using the separate merge json log parser.

A customer that needs this particular fix will have to upgrade to 3.11.

Next step: need a 3.11 compose built with this package, then logging-fluentd 3.11 image built with this rpm

Comment 16 Rich Megginson 2019-07-12 22:24:38 UTC
ART says 3.11 compose rebuild will be in about a week from now

Comment 18 Rich Megginson 2019-07-22 16:20:03 UTC
the fix is in openshift3/ose-logging-fluentd:v3.11.130-1 or later - https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=935020

Comment 20 Anping Li 2019-08-07 06:24:03 UTC
The fix and the gem are in openshift3/ose-logging-fluentd:v3.11.135.

Comment 21 Anping Li 2019-08-07 11:22:47 UTC
The journald container logs are parsed automatically without USE_JOURNAL=true.

Comment 23 errata-xmlrpc 2019-08-13 14:09:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2352


Note You need to log in before you can comment on or make changes to this bug.