Description of problem: All logs from all projects are transferred to Orphaned within Elasticsearch. Elasticsearch is using only Operation indices and Orphaned for logs. Version-Release number of selected component (if applicable): redhat-release-server-7.6-4.el7.x86_64 atomic-openshift-3.11.98-1.git.0.0cbaff3.el7.x86_64 atomic-openshift-clients-3.11.98-1.git.0.0cbaff3.el7.x86_64 atomic-openshift-docker-excluder-3.11.98-1.git.0.0cbaff3.el7.noarch atomic-openshift-excluder-3.11.98-1.git.0.0cbaff3.el7.noarch atomic-openshift-hyperkube-3.11.98-1.git.0.0cbaff3.el7.x86_64 atomic-openshift-node-3.11.98-1.git.0.0cbaff3.el7.x86_64 atomic-registries-1.22.1-26.gitb507039.el7.x86_64 docker-1.13.1-94.gitb2f74b2.el7.x86_64 docker-client-1.13.1-94.gitb2f74b2.el7.x86_64 docker-common-1.13.1-94.gitb2f74b2.el7.x86_64 docker-rhel-push-plugin-1.13.1-94.gitb2f74b2.el7.x86_64 How reproducible: I am not able to reproduce this issue. Actual results: Logs are stored in Orphaned indices. Expected results: Logs from projects are stored in *.Project indices. Additional info: We are not sure if this is a configuration issue or if we hit a bug. But logs from Elasticsearch: --- Clustername: logging-es Clusterstate: GREEN Number of nodes: 1 Number of data nodes: 1 .searchguard index does not exists, attempt to create it ... done (0-all replicas) Populate config from /opt/app-root/src/sgconfig/ Will update 'config' with /opt/app-root/src/sgconfig/sg_config.yml SUCC: Configuration for 'config' created or updated Will update 'roles' with /opt/app-root/src/sgconfig/sg_roles.yml SUCC: Configuration for 'roles' created or updated Will update 'rolesmapping' with /opt/app-root/src/sgconfig/sg_roles_mapping.yml SUCC: Configuration for 'rolesmapping' created or updated Will update 'internalusers' with /opt/app-root/src/sgconfig/sg_internal_users.yml SUCC: Configuration for 'internalusers' created or updated Will update 'actiongroups' with /opt/app-root/src/sgconfig/sg_action_groups.yml SUCC: Configuration for 'actiongroups' created or updated Done with success --
Can you please describe when this occurred? Was this after an upgrade? Reviewing the logs I see there is a point where fluent is starting an unable to contact Elasticsearch. This is indicative of an upgrade or logging start scenario. If fluent is unable to contact the kube API server in order to fetch metadata it will push the logs to the 'orphaned' index. Many times this could be from pods and/or namespaces which no longer exist and it is unable to retrieve meta data at all. If you start a new pod now the logging stack is running are you still experiencing this issue?
Could be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1711596#c8 The cause there is that fluentd could not correctly determine which logging driver was being used for container logs by looking at the docker configuration file.
Please try this: oc set env ds/logging-fluentd DEBUG=true VERBOSE=true This will restart all of your fluentd pods with tracing so we can see what it is doing. Also, please provide your /etc/docker/daemon.json and /etc/sysconfig/docker from one of your nodes where fluentd is running.
https://github.com/openshift/origin-aggregated-logging/pull/1680
merged upstream https://github.com/openshift/origin-aggregated-logging/commit/396764296721ca67a73799357ca2451d484f16dc
*** Bug 1711596 has been marked as a duplicate of this bug. ***
Needs rubygem-fluent-plugin-kubernetes_metadata_filter-1.2.1-1.el7 - this is built and tagged into rhaos-3.11-rhel-7-candidate NOTE: This rpm cannot be tagged into 3.10 and earlier. It requires that the fluentd config is using the separate merge json log parser. A customer that needs this particular fix will have to upgrade to 3.11. Next step: need a 3.11 compose built with this package, then logging-fluentd 3.11 image built with this rpm
ART says 3.11 compose rebuild will be in about a week from now
the fix is in openshift3/ose-logging-fluentd:v3.11.130-1 or later - https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=935020
The fix and the gem are in openshift3/ose-logging-fluentd:v3.11.135.
The journald container logs are parsed automatically without USE_JOURNAL=true.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2352