Bug 1632130
Summary: | [3.9] Fluentd cannot handle S2I Logs | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Noelle Frank <nalentor> | |
Component: | Logging | Assignee: | Rich Megginson <rmeggins> | |
Status: | CLOSED ERRATA | QA Contact: | Qiaoling Tang <qitang> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.9.0 | CC: | anli, aos-bugs, mtaru, rmeggins, stwalter, vlaad | |
Target Milestone: | --- | |||
Target Release: | 3.9.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openshift3/logging-fluentd:v3.9.47-1 | Doc Type: | Bug Fix | |
Doc Text: |
Cause: When using docker with the journald log driver, all container logs, including system and plain docker container logs, are logged to the journal, and read by fluentd.
Consequence: fluentd does not know how to handle these non-kubernetes container logs and throws exceptions.
Fix: Treat non-kubernetes container logs as logs from other system services e.g. send them to the .operations.* index.
Result: Logs from non-kubernetes containers are indexed correctly and do not cause any errors.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1632361 (view as bug list) | Environment: | ||
Last Closed: | 2018-11-20 03:12:03 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1632361, 1632364 |
Description
Noelle Frank
2018-09-24 07:52:26 UTC
PR merged upstream - waiting for next automated sync/build Setting to MODIFIED as per https://mojo.redhat.com/docs/DOC-1178565 "How do I get my Bugzilla Bug to VERIFIED?" Update: The bug only affects containers which have _openshift_ or _openshift-infra_ in the container name. From the original error message: "CONTAINER_NAME"=>"s2i_registry_access_redhat_com_jboss_eap_7_eap70_openshift_sha256_7b4d8986212601403ca07b320fd5dadb9f624298251620b8f6e2b55f993d9124_d50b4d88", The problem is that this matches: https://github.com/openshift/origin-aggregated-logging/blob/dc3d434fb5fac29ba9b448b4985264c92c7b02d3/fluentd/configs.d/openshift/filter-retag-journal.conf#L74 # mark non-kubernetes openshift-infra container logs as system logs rewriterule8 CONTAINER_NAME _openshift-infra_ journal.container._openshift-infra_ # mark non-kubernetes openshift container logs as system logs rewriterule9 CONTAINER_NAME _openshift_ journal.container._openshift_ So as long as the CONTAINER_NAME does not contain _openshift_ or _openshift-infra_, there should be no problem. There is a workaround. Step 1. make a copy of the original logging-fluentd configmap: mkdir logging-fluentd-configmap cd logging-fluentd-configmap oc extract configmap/logging-fluentd --to=. cp fluent.conf fluent.conf.orig Step 2. make a copy of filter-retag-journal.conf from a fluentd pod - pick any fluentd pod and set fluentdpod=name of a fluentd pod oc exec $fluentpod -- cat /etc/fluent/configs.d/openshift/filter-retag-journal.conf > filter-retag-journal.conf Step 3. edit the file filter-retag-journal.conf to remove any lines containing the text "journal.container._openshift-infra_" or "journal.container._openshift_" e.g. remove lines like this: rewriterule8 CONTAINER_NAME _openshift-infra_ journal.container._openshift-infra_ rewriterule9 CONTAINER_NAME _openshift_ journal.container._openshift_ The numbers in the "rewriteruleN" may be different. Step 4. edit fluent.conf created in Step 1 to use the new filter-retag-journal.conf Look for a line like this: @include configs.d/openshift/filter-retag-journal.conf Change "openshift" to "user" like this: @include configs.d/user/filter-retag-journal.conf Step 5. delete the logging-fluentd configmap - it's ok because we saved it in step 1 oc delete cm logging-fluentd Step 6. create the logging-fluentd configmap with the new files oc create configmap logging-fluentd --from-file=. Step 7. redeploy fluentd - stop all fluentd oc label node -l logging-infra-fluentd=true logging-infra-fluentd=false then once all of the fluentd pods have stopped, start them oc label node -l logging-infra-fluentd=false logging-infra-fluentd=true For sti build, the following four containers are created Seq Name 1) k8s_POD_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 2) k8s_git-clone_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 3) k8s_manage-dockerfile_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 4) k8s_sti-build_eap-app-1-build_jboss_18e746a0-ce04-11e8-a6d3-fa163ee91ad7_0 5) s2i_registry_access_redhat_com_jboss_eap_6_eap64_openshift_sha256_df298660f713250938716559f4df1bc85412c99964dbdca65c1fd4f56e12af0a_011cf541 6)k8s_deployment_eap-app-1-deploy_jboss_36ad3d2a-ce04-11e8-a6d3-fa163ee91ad7 For docker build, the following containers are created Seq Name 1) k8s_git-clone_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0 2) k8s_manage-dockerfile_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0 3) k8s_docker-build_ruby-hello-world-1-build_install-test_177cc797-ce08-11e8-a6d3-fa163ee91ad7_0 4) vigilant_pare 5) k8s_deployment_ruby-hello-world-1-deploy_install-test_2d6b97fd-ce08-11e8-a6d3-fa163ee91ad7_0 The fix is not in logging-elasticsearch/images/v3.9.45-1. Waiting new puddles The sti_xx container logs have been send to ES. These documents are tagged with "kubernetes.container_name: fluentd-elasticsearch,kubernetes.pod_name: logging-fluentd-xxxx" wrongly. And all logs from s2i_xxx containers have been forward to k8s_sti-build_xxx_containers. No log is lost. The comment 9 base on the test result using v3.9.45-1. Waiting the fix to see what will happen. @anli - the problem only happens when you are using the journald docker log driver, and have the string _openshift_ or _openshift-infra_ in the CONTAINER_NAME. @rich, Yes, the comment 7-10 are base on Journald log driver. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2908 |