Bug 1632130 - [3.9] Fluentd cannot handle S2I Logs
Summary: [3.9] Fluentd cannot handle S2I Logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.9.z
Assignee: Rich Megginson
QA Contact: Qiaoling Tang
URL:
Whiteboard:
Depends On:
Blocks: 1632361 1632364
TreeView+ depends on / blocked
 
Reported: 2018-09-24 07:52 UTC by Noelle Frank
Modified: 2018-11-20 03:12 UTC (History)
6 users (show)

Fixed In Version: openshift3/logging-fluentd:v3.9.47-1
Doc Type: Bug Fix
Doc Text:
Cause: When using docker with the journald log driver, all container logs, including system and plain docker container logs, are logged to the journal, and read by fluentd. Consequence: fluentd does not know how to handle these non-kubernetes container logs and throws exceptions. Fix: Treat non-kubernetes container logs as logs from other system services e.g. send them to the .operations.* index. Result: Logs from non-kubernetes containers are indexed correctly and do not cause any errors.
Clone Of:
: 1632361 (view as bug list)
Environment:
Last Closed: 2018-11-20 03:12:03 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2908 None None None 2018-11-20 03:12:44 UTC
Github openshift origin-aggregated-logging pull 1360 None None None 2018-09-24 17:10:21 UTC
Red Hat Knowledge Base (Solution) 3628941 None None None 2018-09-27 18:40:59 UTC

Description Noelle Frank 2018-09-24 07:52:26 UTC
Description of problem:
Fluentd does not handle logs from "s2i" containers.
Fluentd expects all container logs from the journal have a CONTAINER_NAME value with a "k8s_" prefix - it doesn't know how to handle the "s2i_" prefix.

Version-Release number of selected component (if applicable): OCP 3.9.27

How reproducible:
Create S2I pod and look for logs in Kibana


Steps to Reproduce:
1. Create a project/namespace. 
2. Create a couple S2I pods in project along with other pods, can use console and pick from catalog. 
3. Later, go to Kibana with the user owner of the project or an user with cluster admin role.

Actual results: 
Logs will be visible for some pods, but not S2I pods.

Expected results:
Logs should be visible for all pods.


Additional info:

Abstract from our cluster's logs:

2018-09-19 07:48:17 -0400 [warn]: dump an error event: error_class=TypeError error="no implicit conversion of nil into String" location="/usr/share/gems/gems/fluent-plugin-viaq_data_model-0.0.14/lib/fluent/plugin/filter_viaq_data_model.rb:413:in `parse'" tag="journal.container._openshift_" time=1536076266 record={"PRIORITY"=>"6", "_TRANSPORT"=>"journal", "_PID"=>"10183", "_UID"=>"0", "_GID"=>"0", "_COMM"=>"dockerd-current", "_EXE"=>"/usr/bin/dockerd-current", "_CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --selinux-enabled --log-driver=journald --signature-verification=False --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt dm.use_deferred_removal=true --mtu=8951 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com", "_CAP_EFFECTIVE"=>"1fffffffff", "_SYSTEMD_CGROUP"=>"/system.slice/docker.service", "_SYSTEMD_UNIT"=>"docker.service", "_SYSTEMD_SLICE"=>"system.slice", "_SELINUX_CONTEXT"=>"system_u:system_r:container_runtime_t:s0", "_BOOT_ID"=>"500dce08e3ef491b9107b3d389cd2966", "_MACHINE_ID"=>"cec1cf01289d4a53b9488a651ddca907", "_HOSTNAME"=>"node9.rhpds.internal", "CONTAINER_ID"=>"448aa02d4d4a", "CONTAINER_ID_FULL"=>"448aa02d4d4abf1a6a07590fb8a400196501a49eaf38173de0b89dca4f564c03", "CONTAINER_NAME"=>"s2i_registry_access_redhat_com_jboss_eap_7_eap70_openshift_sha256_7b4d8986212601403ca07b320fd5dadb9f624298251620b8f6e2b55f993d9124_d50b4d88", "MESSAGE"=>"[INFO] Downloading: http://nexus.opentlc-shared.svc.cluster.local:8081/repository/maven-all-public/org/jboss/spec/jboss-javaee-7.0/1.0.3.Final-redhat-2/jboss-javaee-7.0-1.0.3.Final-redhat-2.pom", "_SOURCE_REALTIME_TIMESTAMP"=>"1536076266567114", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.1.21.57", "ipaddr6"=>"fe80::9889:7aff:fe35:4a94", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-09-19T11:48:17.193205+00:00", "version"=>"0.12.43 1.6.0"}}}

Comment 1 Rich Megginson 2018-09-25 17:31:39 UTC
PR merged upstream - waiting for next automated sync/build

Comment 3 Rich Megginson 2018-09-25 18:04:41 UTC
Setting to MODIFIED as per https://mojo.redhat.com/docs/DOC-1178565 "How do I get my Bugzilla Bug to VERIFIED?"

Comment 4 Rich Megginson 2018-09-27 19:50:38 UTC
Update:  The bug only affects containers which have _openshift_ or _openshift-infra_ in the container name.  From the original error message:

"CONTAINER_NAME"=>"s2i_registry_access_redhat_com_jboss_eap_7_eap70_openshift_sha256_7b4d8986212601403ca07b320fd5dadb9f624298251620b8f6e2b55f993d9124_d50b4d88",

The problem is that this matches:

https://github.com/openshift/origin-aggregated-logging/blob/dc3d434fb5fac29ba9b448b4985264c92c7b02d3/fluentd/configs.d/openshift/filter-retag-journal.conf#L74

  # mark non-kubernetes openshift-infra container logs as system logs
  rewriterule8 CONTAINER_NAME _openshift-infra_ journal.container._openshift-infra_
  # mark non-kubernetes openshift container logs as system logs
  rewriterule9 CONTAINER_NAME _openshift_ journal.container._openshift_

So as long as the CONTAINER_NAME does not contain _openshift_ or _openshift-infra_, there should be no problem.

Comment 5 Rich Megginson 2018-09-27 20:06:57 UTC
There is a workaround.

Step 1. make a copy of the original logging-fluentd configmap:

mkdir logging-fluentd-configmap

cd logging-fluentd-configmap

oc extract configmap/logging-fluentd --to=.

cp fluent.conf fluent.conf.orig

Step 2. make a copy of filter-retag-journal.conf from a fluentd pod - pick any fluentd pod and set fluentdpod=name of a fluentd pod

oc exec $fluentpod -- cat /etc/fluent/configs.d/openshift/filter-retag-journal.conf > filter-retag-journal.conf

Step 3. edit the file filter-retag-journal.conf to remove any lines containing the text "journal.container._openshift-infra_" or "journal.container._openshift_" e.g. remove lines like this:

rewriterule8 CONTAINER_NAME _openshift-infra_ journal.container._openshift-infra_

rewriterule9 CONTAINER_NAME _openshift_ journal.container._openshift_

The numbers in the "rewriteruleN" may be different.

Step 4. edit fluent.conf created in Step 1 to use the new filter-retag-journal.conf

Look for a line like this:

      @include configs.d/openshift/filter-retag-journal.conf

Change "openshift" to "user" like this:

      @include configs.d/user/filter-retag-journal.conf

Step 5. delete the logging-fluentd configmap - it's ok because we saved it in step 1

oc delete cm logging-fluentd

Step 6. create the logging-fluentd configmap with the new files

oc create configmap logging-fluentd --from-file=.

Step 7. redeploy fluentd - stop all fluentd

oc label node -l logging-infra-fluentd=true logging-infra-fluentd=false

then once all of the fluentd pods have stopped, start them

oc label node -l logging-infra-fluentd=false logging-infra-fluentd=true

Comment 7 Anping Li 2018-10-12 10:36:07 UTC
For sti build, the following four containers are created
Seq  Name
1) k8s_POD_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 	 
2) k8s_git-clone_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 	 
3) k8s_manage-dockerfile_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0
4) k8s_sti-build_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0
5) s2i_registry_access_redhat_com_jboss_eap_6_eap64_openshift_sha256_df298660f713250938716559f4df1bc85412c99964dbdca65c1fd4f56e12af0a_011cf541 	
6)k8s_deployment_eap-app-1-deploy_jboss_36ad3d2a-ce04-11e8-a6d3-fa163ee91ad7

For docker build, the following containers are created
Seq     Name
1) k8s_git-clone_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0
2) k8s_manage-dockerfile_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0
3) k8s_docker-build_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0
4) vigilant_pare
5) k8s_deployment_ruby-hello-world-1-deploy_install-test_2d6b97fd-ce08-11e8-a6d3-fa163ee91ad7_0

Comment 8 Anping Li 2018-10-12 12:06:28 UTC
The fix is not in logging-elasticsearch/images/v3.9.45-1. Waiting new puddles

Comment 9 Anping Li 2018-10-12 12:27:08 UTC
The sti_xx container logs have been send to ES. These documents are tagged with "kubernetes.container_name: fluentd-elasticsearch,kubernetes.pod_name: logging-fluentd-xxxx" wrongly.

And all logs from  s2i_xxx containers have been forward to k8s_sti-build_xxx_containers. No log is lost.

Comment 10 Anping Li 2018-10-12 12:28:29 UTC
The comment 9 base on the test result using v3.9.45-1. Waiting the fix to see what will happen.

Comment 11 Rich Megginson 2018-10-12 13:15:11 UTC
@anli - the problem only happens when you are using the journald docker log driver, and have the string _openshift_ or _openshift-infra_ in the CONTAINER_NAME.

Comment 13 Anping Li 2018-10-15 02:25:17 UTC
@rich, Yes, the comment 7-10 are base on Journald log driver.

Comment 19 errata-xmlrpc 2018-11-20 03:12:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2908


Note You need to log in before you can comment on or make changes to this bug.