Bug 1632130
| Summary: | [3.9] Fluentd cannot handle S2I Logs | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Noelle Frank <nalentor> | |
| Component: | Logging | Assignee: | Rich Megginson <rmeggins> | |
| Status: | CLOSED ERRATA | QA Contact: | Qiaoling Tang <qitang> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 3.9.0 | CC: | anli, aos-bugs, mtaru, rmeggins, stwalter, vlaad | |
| Target Milestone: | --- | |||
| Target Release: | 3.9.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openshift3/logging-fluentd:v3.9.47-1 | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: When using docker with the journald log driver, all container logs, including system and plain docker container logs, are logged to the journal, and read by fluentd.
Consequence: fluentd does not know how to handle these non-kubernetes container logs and throws exceptions.
Fix: Treat non-kubernetes container logs as logs from other system services e.g. send them to the .operations.* index.
Result: Logs from non-kubernetes containers are indexed correctly and do not cause any errors.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1632361 (view as bug list) | Environment: | ||
| Last Closed: | 2018-11-20 03:12:03 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1632361, 1632364 | |||
PR merged upstream - waiting for next automated sync/build Setting to MODIFIED as per https://mojo.redhat.com/docs/DOC-1178565 "How do I get my Bugzilla Bug to VERIFIED?" Update: The bug only affects containers which have _openshift_ or _openshift-infra_ in the container name. From the original error message: "CONTAINER_NAME"=>"s2i_registry_access_redhat_com_jboss_eap_7_eap70_openshift_sha256_7b4d8986212601403ca07b320fd5dadb9f624298251620b8f6e2b55f993d9124_d50b4d88", The problem is that this matches: https://github.com/openshift/origin-aggregated-logging/blob/dc3d434fb5fac29ba9b448b4985264c92c7b02d3/fluentd/configs.d/openshift/filter-retag-journal.conf#L74 # mark non-kubernetes openshift-infra container logs as system logs rewriterule8 CONTAINER_NAME _openshift-infra_ journal.container._openshift-infra_ # mark non-kubernetes openshift container logs as system logs rewriterule9 CONTAINER_NAME _openshift_ journal.container._openshift_ So as long as the CONTAINER_NAME does not contain _openshift_ or _openshift-infra_, there should be no problem. There is a workaround.
Step 1. make a copy of the original logging-fluentd configmap:
mkdir logging-fluentd-configmap
cd logging-fluentd-configmap
oc extract configmap/logging-fluentd --to=.
cp fluent.conf fluent.conf.orig
Step 2. make a copy of filter-retag-journal.conf from a fluentd pod - pick any fluentd pod and set fluentdpod=name of a fluentd pod
oc exec $fluentpod -- cat /etc/fluent/configs.d/openshift/filter-retag-journal.conf > filter-retag-journal.conf
Step 3. edit the file filter-retag-journal.conf to remove any lines containing the text "journal.container._openshift-infra_" or "journal.container._openshift_" e.g. remove lines like this:
rewriterule8 CONTAINER_NAME _openshift-infra_ journal.container._openshift-infra_
rewriterule9 CONTAINER_NAME _openshift_ journal.container._openshift_
The numbers in the "rewriteruleN" may be different.
Step 4. edit fluent.conf created in Step 1 to use the new filter-retag-journal.conf
Look for a line like this:
@include configs.d/openshift/filter-retag-journal.conf
Change "openshift" to "user" like this:
@include configs.d/user/filter-retag-journal.conf
Step 5. delete the logging-fluentd configmap - it's ok because we saved it in step 1
oc delete cm logging-fluentd
Step 6. create the logging-fluentd configmap with the new files
oc create configmap logging-fluentd --from-file=.
Step 7. redeploy fluentd - stop all fluentd
oc label node -l logging-infra-fluentd=true logging-infra-fluentd=false
then once all of the fluentd pods have stopped, start them
oc label node -l logging-infra-fluentd=false logging-infra-fluentd=true
For sti build, the following four containers are created Seq Name 1) k8s_POD_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 2) k8s_git-clone_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 3) k8s_manage-dockerfile_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 4) k8s_sti-build_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 5) s2i_registry_access_redhat_com_jboss_eap_6_eap64_openshift_sha256_df298660f713250938716559f4df1bc85412c99964dbdca65c1fd4f56e12af0a_011cf541 6)k8s_deployment_eap-app-1-deploy_jboss_36ad3d2a-ce04-11e8-a6d3-fa163ee91ad7 For docker build, the following containers are created Seq Name 1) k8s_git-clone_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0 2) k8s_manage-dockerfile_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0 3) k8s_docker-build_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0 4) vigilant_pare 5) k8s_deployment_ruby-hello-world-1-deploy_install-test_2d6b97fd-ce08-11e8-a6d3-fa163ee91ad7_0 The fix is not in logging-elasticsearch/images/v3.9.45-1. Waiting new puddles The sti_xx container logs have been send to ES. These documents are tagged with "kubernetes.container_name: fluentd-elasticsearch,kubernetes.pod_name: logging-fluentd-xxxx" wrongly. And all logs from s2i_xxx containers have been forward to k8s_sti-build_xxx_containers. No log is lost. The comment 9 base on the test result using v3.9.45-1. Waiting the fix to see what will happen. @anli - the problem only happens when you are using the journald docker log driver, and have the string _openshift_ or _openshift-infra_ in the CONTAINER_NAME. @rich, Yes, the comment 7-10 are base on Journald log driver. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2908 |
Description of problem: Fluentd does not handle logs from "s2i" containers. Fluentd expects all container logs from the journal have a CONTAINER_NAME value with a "k8s_" prefix - it doesn't know how to handle the "s2i_" prefix. Version-Release number of selected component (if applicable): OCP 3.9.27 How reproducible: Create S2I pod and look for logs in Kibana Steps to Reproduce: 1. Create a project/namespace. 2. Create a couple S2I pods in project along with other pods, can use console and pick from catalog. 3. Later, go to Kibana with the user owner of the project or an user with cluster admin role. Actual results: Logs will be visible for some pods, but not S2I pods. Expected results: Logs should be visible for all pods. Additional info: Abstract from our cluster's logs: 2018-09-19 07:48:17 -0400 [warn]: dump an error event: error_class=TypeError error="no implicit conversion of nil into String" location="/usr/share/gems/gems/fluent-plugin-viaq_data_model-0.0.14/lib/fluent/plugin/filter_viaq_data_model.rb:413:in `parse'" tag="journal.container._openshift_" time=1536076266 record={"PRIORITY"=>"6", "_TRANSPORT"=>"journal", "_PID"=>"10183", "_UID"=>"0", "_GID"=>"0", "_COMM"=>"dockerd-current", "_EXE"=>"/usr/bin/dockerd-current", "_CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --selinux-enabled --log-driver=journald --signature-verification=False --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt dm.use_deferred_removal=true --mtu=8951 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com", "_CAP_EFFECTIVE"=>"1fffffffff", "_SYSTEMD_CGROUP"=>"/system.slice/docker.service", "_SYSTEMD_UNIT"=>"docker.service", "_SYSTEMD_SLICE"=>"system.slice", "_SELINUX_CONTEXT"=>"system_u:system_r:container_runtime_t:s0", "_BOOT_ID"=>"500dce08e3ef491b9107b3d389cd2966", "_MACHINE_ID"=>"cec1cf01289d4a53b9488a651ddca907", "_HOSTNAME"=>"node9.rhpds.internal", "CONTAINER_ID"=>"448aa02d4d4a", "CONTAINER_ID_FULL"=>"448aa02d4d4abf1a6a07590fb8a400196501a49eaf38173de0b89dca4f564c03", "CONTAINER_NAME"=>"s2i_registry_access_redhat_com_jboss_eap_7_eap70_openshift_sha256_7b4d8986212601403ca07b320fd5dadb9f624298251620b8f6e2b55f993d9124_d50b4d88", "MESSAGE"=>"[INFO] Downloading: http://nexus.opentlc-shared.svc.cluster.local:8081/repository/maven-all-public/org/jboss/spec/jboss-javaee-7.0/1.0.3.Final-redhat-2/jboss-javaee-7.0-1.0.3.Final-redhat-2.pom", "_SOURCE_REALTIME_TIMESTAMP"=>"1536076266567114", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.1.21.57", "ipaddr6"=>"fe80::9889:7aff:fe35:4a94", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-09-19T11:48:17.193205+00:00", "version"=>"0.12.43 1.6.0"}}}