Bug 1632648

Summary: [3.11] Record is missing kubernetes field when use '--log-driver journald' in /etc/sysconfig/docker
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: LoggingAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: anli, aos-bugs, jcantril, qitang, rmeggins
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openshift3/ose-logging-fluentd:v3.11.23-1 Doc Type: Bug Fix
Doc Text:
Cause: When using docker with log-driver journald, the setting in /etc/sysconfig/docker has changed to use --log-driver journald instead of --log-driver=journald Consequence: Fluentd cannot detect that journald is being used, so assumes json-file, and cannot read any kubernetes metadata because it doesn't look for the journald CONTAINER_NAME field. The result is lots of fluentd errors. Fix: Change the way Fluentd detects the docker log driver so that it looks for --log-driver journald in addition to --log-driver=journald. Result: Fluentd can detect the docker log driver, and can correctly process Kubernetes container logs.
Story Points: ---
Clone Of:
: 1638899 (view as bug list) Environment:
Last Closed: 2018-11-20 03:10:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1638899, 1638900, 1638902    

Description Anping Li 2018-09-25 10:03:31 UTC
Description of problem:
When the Docker log driver is journald. the fluentd couldn't get the kubernetes fields for the container logs. all container records are sent to .orphaned index


Version-Release number of selected component (if applicable):
openshift3/logging-fluentd/images/v3.10.45-2
openshift3/ose-logging-fluentd/images/v3.11.14-1

How reproducible:
Always

Steps to Reproduce:
1. Deploy Openshift using jouranld log driver
 [OSEv3:vars]
   openshift_docker_log_driver=journald
2. Deploy logging and deploy applications in project

3. List the index
 https://localhost:9200/_cat/shards
.operations.2018.09.25 0 p STARTED 1118247  1.2gb 10.129.0.16 logging-es-data-master-dxqfct3g
.orphaned.2018.09.25   0 p STARTED   12632 11.4mb 10.129.0.16 logging-es-data-master-dxqfct3g
.kibana                0 p STARTED       1  3.2kb 10.129.0.16 logging-es-data-master-dxqfct3g
.searchguard           0 p STARTED       5 32.2kb 10.129.0.16 logging-es-data-master-dxqfct3g

4. Check the fluentd debug logs

2018-09-25 05:43:40 -0400 [error]: plugin/filter_viaq_data_model.rb:398:block in add_elasticsearch_index_name_field: record cannot use elasticsearch index name type project_full: record is missing kubernetes field: {"CONTAINER_TAG"=>"360c189e376f", "systemd"=>{"t"=>{"BOOT_ID"=>"2d2638ef904a43578373ae9374bdc19f", "CAP_EFFECTIVE"=>"1fffffffff", "CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver journald --signature-verification=False --storage-driver overlay2 --mtu=8951 --add-registry registry.redhat.io --block-registry registry.hacker.com --insecure-registry brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 --insecure-registry virt-openshift-05.lab.eng.nay.redhat.com:5000 --insecure-registry virt-openshift-05.lab.eng.nay.redhat.com:5001 --insecure-registry registry.reg-aws.openshift.com:443 --insecure-registry asb-registry.usersys.redhat.com:5000 --add-registry registry.access.redhat.com", "COMM"=>"dockerd-current", "EXE"=>"/usr/bin/dockerd-current", "GID"=>"0", "MACHINE_ID"=>"c8bb65be446e4c10a95db25ee0c9565d", "PID"=>"4071", "SELINUX_CONTEXT"=>"system_u:system_r:container_runtime_t:s0", "SYSTEMD_CGROUP"=>"/system.slice/docker.service", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"docker.service", "TRANSPORT"=>"journal", "UID"=>"0"}}, "level"=>"info", "message"=>"2018-09-25 09:43:40,139 - SVTLogger - INFO - centos-logtest-76krn : 23916 : kTwY7XX9N ftazV4wq2 IusGtcLAu 7hr1kIzF8 9SozNh3tW mKUPLIrer cVjcYfuRz CKdSu73ef 4TxEPnjs0 eGz55XNc9 lqqFfEKAg 3h7DvZnmb TXO2sV521 az2NbHaUA 6ha50o5pP MVzKooeDR iMIozQxC4 jtAYFJMrX KU4e7tssG OZdppkafl ", "hostname"=>"ip-172-18-4-174.ec2.internal", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.129.0.18", "ipaddr6"=>"fe80::c0ac:94ff:fe05:6fd4", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-09-25T09:43:40.934757+00:00", "version"=>"0.12.43 1.6.0"}}, "@timestamp"=>"2018-09-25T09:43:40.139658+00:00"}
2018-09-25 05:43:41 -0400 [error]: plugin/filter_viaq_data_model.rb:398:block in add_elasticsearch_index_name_field: record cannot use elasticsearch index name type project_full: record is missing kubernetes field: {"CONTAINER_TAG"=>"360c189e376f", "systemd"=>{"t"=>{"BOOT_ID"=>"2d2638ef904a43578373ae9374bdc19f", "CAP_EFFECTIVE"=>"1fffffffff", "CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver journald --signature-verification=False --storage-driver overlay2 --mtu=8951 --add-registry registry.redhat.io --block-registry registry.hacker.com --insecure-registry brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 --insecure-registry virt-openshift-05.lab.eng.nay.redhat.com:5000 --insecure-registry virt-openshift-05.lab.eng.nay.redhat.com:5001 --insecure-registry registry.reg-aws.openshift.com:443 --insecure-registry asb-registry.usersys.redhat.com:5000 --add-registry registry.access.redhat.com", "COMM"=>"dockerd-current", "EXE"=>"/usr/bin/dockerd-current", "GID"=>"0", "MACHINE_ID"=>"c8bb65be446e4c10a95db25ee0c9565d", "PID"=>"4071", "SELINUX_CONTEXT"=>"system_u:system_r:container_runtime_t:s0", "SYSTEMD_CGROUP"=>"/system.slice/docker.service", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"docker.service", "TRANSPORT"=>"journal", "UID"=>"0"}}, "level"=>"info", "message"=>"2018-09-25 09:43:41,140 - SVTLogger - INFO - centos-logtest-76krn : 23917 : kTwY7XX9N ftazV4wq2 IusGtcLAu 7hr1kIzF8 9SozNh3tW mKUPLIrer cVjcYfuRz CKdSu73ef 4TxEPnjs0 eGz55XNc9 lqqFfEKAg 3h7DvZnmb TXO2sV521 az2NbHaUA 6ha50o5pP MVzKooeDR iMIozQxC4 jtAYFJMrX KU4e7tssG OZdppkafl ", "hostname"=>"ip-172-18-4-174.ec2.internal", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.129.0.18", "ipaddr6"=>"fe80::c0ac:94ff:fe05:6fd4", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-09-25T09:43:41.980706+00:00", "version"=>"0.12.43 1.6.0"}}, "@timestamp"=>"2018-09-25T09:43:41.140979+00:00"}
Actual results:
No project.${projectnane}.${uuid}.${date} index in Elastcisearch

Expected results:
The container logs are hosted in indices named project.${projectnane}.${uuid}.${date}

Additional info:

Comment 1 Jeff Cantrill 2018-09-25 12:33:25 UTC
(In reply to Anping Li from comment #0)

> 
> Version-Release number of selected component (if applicable):
> openshift3/logging-fluentd/images/v3.10.45-2
> openshift3/ose-logging-fluentd/images/v3.11.14-1

Which version are you reporting the issue against?

Comment 2 Anping Li 2018-09-25 12:47:22 UTC
@jeff,  In v3.11,  if confirmed , the bug should be cloned to v3.10.

Comment 3 Rich Megginson 2018-09-25 13:10:52 UTC
Is this a regression?

Comment 4 Anping Li 2018-09-25 13:17:27 UTC
Yes, It work prior.

Comment 5 Rich Megginson 2018-09-25 15:03:51 UTC
I cannot reproduce.  I'm using the latest available 3.11 packages/images:

# docker images|grep fluentd
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-logging-fluentd           v3.11               98f5f5f31f7d        36 hours ago        286 MB

NOTE WELL: I do not have the 3.10 image you have listed above:

openshift3/logging-fluentd/images/v3.10.45-2

I'm not sure why there would be a 3.10 and a 3.11 fluentd image on the system at the same time.  That seems suspicious to me.

Please reproduce, and please leave the system up and running for me to log into and examine.

This is what I have:

green open project.test-project.f6e6beb3-c0d2-11e8-8af8-5452005eafb9.2018.09.25 jbfxcnnXTSOxjBVg9THz7A 1 0   304 0 403.9kb 403.9kb


# oc exec -c elasticsearch $espod -- es_util --qu
ery=project.test-project.*/_search?q=kubernetes.namespace_name:test-project | jq
 . |more

  "hits": {
    "total": 374,
...
        "_index": "project.test-project.f6e6beb3-c0d2-11e8-8af8-5452005eafb9.201

Comment 6 Jeff Cantrill 2018-09-25 15:24:20 UTC
Moving to 3.11.z stream:

* see #c5 - unreproducible
* we recommend users to use JSON driver
* default cluster deployments use JSON driver
* Logging CI regularly tests this driver with no failures at present

This may have to be moved into the first z stream release if we are able to further isolate the issue.

Comment 8 Rich Megginson 2018-09-26 13:59:03 UTC
grr - the format of /etc/sysconfig/docker changed - instead of

OPTIONS=' --selinux-enabled   --log-driver=journald     --signature-verification=False'

it now uses

OPTIONS=' --selinux-enabled   --log-driver journald     --signature-verification=False'

Note the space instead of = after "--log-driver" :-(

Comment 9 openshift-github-bot 2018-09-26 16:18:31 UTC
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/b2915fb5d0f425000e1926e0a5aaa6b2b14bdea7
Bug 1632648 - Record is missing kubernetes field when Docker log driver is journald

https://bugzilla.redhat.com/show_bug.cgi?id=1632648
In /etc/sysconfig/docker, the log-driver may be specified
as --log-driver=journald or --log-driver journald delimited
by one or more whitespace

https://github.com/openshift/origin-aggregated-logging/commit/3ad88ef01ee1ee7f2ae1be30f87792fd9e3c0e48
Merge pull request #1364 from richm/bug-1632648

Bug 1632648 - Record is missing kubernetes field when Docker log driver is journald

Comment 10 Rich Megginson 2018-09-26 16:21:37 UTC
Workarounds:
- change the /etc/sysconfig/docker on each node - change it to use --log-driver=journald
- oc edit ds/logging-fluentd - add env USE_JOURNAL true e.g.

oc set env ds/logging-fluentd USE_JOURNAL=true

Comment 13 Qiaoling Tang 2018-11-02 06:00:32 UTC
Verified on logging-fluentd-v3.11.36-1.

Comment 15 errata-xmlrpc 2018-11-20 03:10:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3537