Description of problem: Fluentd does not handle logs from "s2i" containers. Fluentd expects all container logs from the journal have a CONTAINER_NAME value with a "k8s_" prefix - it doesn't know how to handle the "s2i_" prefix. Version-Release number of selected component (if applicable): OCP 3.9.27 How reproducible: Create S2I pod and look for logs in Kibana Steps to Reproduce: 1. Create a project/namespace. 2. Create a couple S2I pods in project along with other pods, can use console and pick from catalog. 3. Later, go to Kibana with the user owner of the project or an user with cluster admin role. Actual results: Logs will be visible for some pods, but not S2I pods. Expected results: Logs should be visible for all pods. Additional info: Abstract from our cluster's logs: 2018-09-19 07:48:17 -0400 [warn]: dump an error event: error_class=TypeError error="no implicit conversion of nil into String" location="/usr/share/gems/gems/fluent-plugin-viaq_data_model-0.0.14/lib/fluent/plugin/filter_viaq_data_model.rb:413:in `parse'" tag="journal.container._openshift_" time=1536076266 record={"PRIORITY"=>"6", "_TRANSPORT"=>"journal", "_PID"=>"10183", "_UID"=>"0", "_GID"=>"0", "_COMM"=>"dockerd-current", "_EXE"=>"/usr/bin/dockerd-current", "_CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --selinux-enabled --log-driver=journald --signature-verification=False --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt dm.use_deferred_removal=true --mtu=8951 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com", "_CAP_EFFECTIVE"=>"1fffffffff", "_SYSTEMD_CGROUP"=>"/system.slice/docker.service", "_SYSTEMD_UNIT"=>"docker.service", "_SYSTEMD_SLICE"=>"system.slice", "_SELINUX_CONTEXT"=>"system_u:system_r:container_runtime_t:s0", "_BOOT_ID"=>"500dce08e3ef491b9107b3d389cd2966", "_MACHINE_ID"=>"cec1cf01289d4a53b9488a651ddca907", "_HOSTNAME"=>"node9.rhpds.internal", "CONTAINER_ID"=>"448aa02d4d4a", "CONTAINER_ID_FULL"=>"448aa02d4d4abf1a6a07590fb8a400196501a49eaf38173de0b89dca4f564c03", "CONTAINER_NAME"=>"s2i_registry_access_redhat_com_jboss_eap_7_eap70_openshift_sha256_7b4d8986212601403ca07b320fd5dadb9f624298251620b8f6e2b55f993d9124_d50b4d88", "MESSAGE"=>"[INFO] Downloading: http://nexus.opentlc-shared.svc.cluster.local:8081/repository/maven-all-public/org/jboss/spec/jboss-javaee-7.0/1.0.3.Final-redhat-2/jboss-javaee-7.0-1.0.3.Final-redhat-2.pom", "_SOURCE_REALTIME_TIMESTAMP"=>"1536076266567114", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.1.21.57", "ipaddr6"=>"fe80::9889:7aff:fe35:4a94", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-09-19T11:48:17.193205+00:00", "version"=>"0.12.43 1.6.0"}}}
PR merged upstream - waiting for next automated sync/build
https://github.com/openshift/origin-aggregated-logging/pull/1360
Setting to MODIFIED as per https://mojo.redhat.com/docs/DOC-1178565 "How do I get my Bugzilla Bug to VERIFIED?"
Update: The bug only affects containers which have _openshift_ or _openshift-infra_ in the container name. From the original error message: "CONTAINER_NAME"=>"s2i_registry_access_redhat_com_jboss_eap_7_eap70_openshift_sha256_7b4d8986212601403ca07b320fd5dadb9f624298251620b8f6e2b55f993d9124_d50b4d88", The problem is that this matches: https://github.com/openshift/origin-aggregated-logging/blob/dc3d434fb5fac29ba9b448b4985264c92c7b02d3/fluentd/configs.d/openshift/filter-retag-journal.conf#L74 # mark non-kubernetes openshift-infra container logs as system logs rewriterule8 CONTAINER_NAME _openshift-infra_ journal.container._openshift-infra_ # mark non-kubernetes openshift container logs as system logs rewriterule9 CONTAINER_NAME _openshift_ journal.container._openshift_ So as long as the CONTAINER_NAME does not contain _openshift_ or _openshift-infra_, there should be no problem.
There is a workaround. Step 1. make a copy of the original logging-fluentd configmap: mkdir logging-fluentd-configmap cd logging-fluentd-configmap oc extract configmap/logging-fluentd --to=. cp fluent.conf fluent.conf.orig Step 2. make a copy of filter-retag-journal.conf from a fluentd pod - pick any fluentd pod and set fluentdpod=name of a fluentd pod oc exec $fluentpod -- cat /etc/fluent/configs.d/openshift/filter-retag-journal.conf > filter-retag-journal.conf Step 3. edit the file filter-retag-journal.conf to remove any lines containing the text "journal.container._openshift-infra_" or "journal.container._openshift_" e.g. remove lines like this: rewriterule8 CONTAINER_NAME _openshift-infra_ journal.container._openshift-infra_ rewriterule9 CONTAINER_NAME _openshift_ journal.container._openshift_ The numbers in the "rewriteruleN" may be different. Step 4. edit fluent.conf created in Step 1 to use the new filter-retag-journal.conf Look for a line like this: @include configs.d/openshift/filter-retag-journal.conf Change "openshift" to "user" like this: @include configs.d/user/filter-retag-journal.conf Step 5. delete the logging-fluentd configmap - it's ok because we saved it in step 1 oc delete cm logging-fluentd Step 6. create the logging-fluentd configmap with the new files oc create configmap logging-fluentd --from-file=. Step 7. redeploy fluentd - stop all fluentd oc label node -l logging-infra-fluentd=true logging-infra-fluentd=false then once all of the fluentd pods have stopped, start them oc label node -l logging-infra-fluentd=false logging-infra-fluentd=true
For sti build, the following four containers are created Seq Name 1) k8s_POD_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 2) k8s_git-clone_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 3) k8s_manage-dockerfile_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 4) k8s_sti-build_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 5) s2i_registry_access_redhat_com_jboss_eap_6_eap64_openshift_sha256_df298660f713250938716559f4df1bc85412c99964dbdca65c1fd4f56e12af0a_011cf541 6)k8s_deployment_eap-app-1-deploy_jboss_36ad3d2a-ce04-11e8-a6d3-fa163ee91ad7 For docker build, the following containers are created Seq Name 1) k8s_git-clone_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0 2) k8s_manage-dockerfile_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0 3) k8s_docker-build_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0 4) vigilant_pare 5) k8s_deployment_ruby-hello-world-1-deploy_install-test_2d6b97fd-ce08-11e8-a6d3-fa163ee91ad7_0
The fix is not in logging-elasticsearch/images/v3.9.45-1. Waiting new puddles
The sti_xx container logs have been send to ES. These documents are tagged with "kubernetes.container_name: fluentd-elasticsearch,kubernetes.pod_name: logging-fluentd-xxxx" wrongly. And all logs from s2i_xxx containers have been forward to k8s_sti-build_xxx_containers. No log is lost.
The comment 9 base on the test result using v3.9.45-1. Waiting the fix to see what will happen.
@anli - the problem only happens when you are using the journald docker log driver, and have the string _openshift_ or _openshift-infra_ in the CONTAINER_NAME.
@rich, Yes, the comment 7-10 are base on Journald log driver.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2908