Bug 1392104 - Container logs for json-file log-driver not sent to ES
Summary: Container logs for json-file log-driver not sent to ES
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Rich Megginson
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-04 20:08 UTC by ewolinet
Modified: 2017-03-08 18:43 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
This was a pre-release issue, no docs required.
Clone Of:
Environment:
Last Closed: 2017-01-18 12:49:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logging_deployment_steps (6.72 KB, text/plain)
2016-11-08 06:06 UTC, Xia Zhao
no flags Details
fluentd logs when using journald log driver (10.59 KB, text/plain)
2016-11-08 08:40 UTC, Xia Zhao
no flags Details
easy-aos.py (55.67 KB, text/x-python)
2016-11-10 02:24 UTC, Xia Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description ewolinet 2016-11-04 20:08:09 UTC
Description of problem:
Container logs when using the json-file log driver don't appear to be read in by Fluentd.

Using version 3.4.0 images with prefix registry.ops.openshift.com/openshift3/ 

When rsh'd into my Fluentd pod:

sh-4.2# grep log-driver /etc/sysconfig/docker
OPTIONS=' --selinux-enabled --log-driver=json-file --log-opt max-size=50m'
sh-4.2# cat /var/log/es-containers.log.pos 
sh-4.2# cat /var/log/node.log.pos          
/var/log/messages	00000000036c984d	0000000000802358

sh-4.2# env
LOGGING_ES_PORT_9200_TCP_ADDR=172.30.112.118
OPS_COPY_PASSWORD=
HOSTNAME=logging-fluentd-efiap
GEM_HOME=/opt/app-root/src
TERM=xterm-256color
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT=tcp://172.30.0.1:443
ES_COPY_SCHEME=https
ES_CA=/etc/fluent/keys/ca
LOGGING_ES_CLUSTER_PORT_9300_TCP_ADDR=172.30.100.156
LOGGING_ES_CLUSTER_SERVICE_HOST=172.30.100.156
KUBERNETES_SERVICE_PORT=443
OPS_COPY_USERNAME=
OPS_COPY_CLIENT_CERT=
LOGGING_KIBANA_PORT_443_TCP_ADDR=172.30.36.200
KUBERNETES_PORT_53_TCP=tcp://172.30.0.1:53
KUBERNETES_SERVICE_HOST=172.30.0.1
KUBERNETES_PORT_53_UDP=udp://172.30.0.1:53
KUBERNETES_PORT_53_TCP_PORT=53
LOGGING_ES_PORT_9200_TCP_PROTO=tcp
OPS_HOST=logging-es
LOGGING_KIBANA_PORT_443_TCP=tcp://172.30.36.200:443
KUBERNETES_SERVICE_PORT_DNS=53
ES_COPY_PORT=
FLUENTD_VERSION=0.12.29
LOGGING_ES_SERVICE_HOST=172.30.112.118
ES_COPY_USERNAME=
LOGGING_ES_CLUSTER_PORT_9300_TCP_PORT=9300
OPS_COPY_CLIENT_KEY=
LOGGING_KIBANA_SERVICE_HOST=172.30.36.200
ES_COPY_PASSWORD=
KUBERNETES_PORT_53_TCP_ADDR=172.30.0.1
ES_HOST=logging-es
OPS_COPY_CA=
LOGGING_KIBANA_SERVICE_PORT=443
LOGGING_KIBANA_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_53_UDP_ADDR=172.30.0.1
OPS_COPY_HOST=
PATH=/opt/app-root/src/bin:/opt/app-root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LOGGING_ES_CLUSTER_PORT_9300_TCP_PROTO=tcp
LOGGING_ES_PORT_9200_TCP_PORT=9200
PWD=/opt/app-root/src
LOGGING_ES_CLUSTER_PORT=tcp://172.30.100.156:9300
ES_COPY_CA=
ES_COPY_CLIENT_CERT=
LOGGING_KIBANA_PORT=tcp://172.30.36.200:443
KUBERNETES_PORT_53_UDP_PORT=53
ES_COPY_HOST=
ES_CLIENT_CERT=/etc/fluent/keys/cert
OPS_CLIENT_KEY=/etc/fluent/keys/key
K8S_HOST_URL=https://kubernetes.default.svc.cluster.local
LOGGING_ES_CLUSTER_PORT_9300_TCP=tcp://172.30.100.156:9300
SHLVL=1
HOME=/opt/app-root/src
KUBERNETES_PORT_53_UDP_PROTO=udp
KUBERNETES_PORT_443_TCP_PROTO=tcp
OPS_CLIENT_CERT=/etc/fluent/keys/cert
LOGGING_ES_PORT=tcp://172.30.112.118:9200
KUBERNETES_SERVICE_PORT_HTTPS=443
USE_JOURNAL=True
ES_CLIENT_KEY=/etc/fluent/keys/key
LOGGING_ES_PORT_9200_TCP=tcp://172.30.112.118:9200
ES_PORT=9200
LOGGING_ES_SERVICE_PORT=9200
ES_COPY=False
KUBERNETES_PORT_53_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1
KUBERNETES_SERVICE_PORT_DNS_TCP=53
OPS_COPY_PORT=
KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443
LOGGING_KIBANA_PORT_443_TCP_PORT=443
JOURNAL_READ_FROM_HEAD=False
RUBY_VERSION=2.0
container=docker
LOGGING_ES_CLUSTER_SERVICE_PORT=9300
OPS_COPY_SCHEME=https
ES_COPY_CLIENT_KEY=
OPS_PORT=9200
JOURNAL_SOURCE=
OPS_CA=/etc/fluent/keys/ca
_=/usr/bin/env


Version-Release number of selected component (if applicable):
Fluentd

How reproducible:
Always

Steps to Reproduce:
1. Configure Docker to use the 'json-file' log driver
2. Install EFK 3.4.0 from registry.ops.openshift.com/openshift3/
3. Observe no container logs showing up in ES

Actual results:
No container logs ending up in ES, Fluentd should have populated the es-containers.log.pos file but it is empty.

Expected results:
Containers logs should end up in ES and there should be information in the es-containers.log.pos file indicating that Fluentd has read it in.

Additional info:

$ oc version
oc v1.4.0-alpha.1+eab01e0-18
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.18.8.44:8443
openshift v1.4.0-alpha.1+eab01e0-18
kubernetes v1.4.0+776c994


$ cat /etc/system-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Comment 1 Xia Zhao 2016-11-07 02:51:47 UTC
Blocks all logging tests

Comment 2 Rich Megginson 2016-11-07 16:27:11 UTC
(In reply to Xia Zhao from comment #1)
> Blocks all logging tests

Can you use the docker `--log-driver=journald` for now?  How are you deploying logging?  I believe the plan is that OCP 3.4 will use journald as the default log driver.

Comment 3 Xia Zhao 2016-11-08 06:02:31 UTC
(In reply to Rich Megginson from comment #2)
> (In reply to Xia Zhao from comment #1)
> > Blocks all logging tests
> 
> Can you use the docker `--log-driver=journald` for now?  How are you
> deploying logging?  I believe the plan is that OCP 3.4 will use journald as
> the default log driver.

Similar issue was reproduced with journald log driver:

# oc exec logging-curator-1-o8s4q -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel INFO show indices --all-indices
2016-11-08 00:54:48,933 INFO      Job starting: show indices
2016-11-08 00:54:48,933 INFO      Attempting to verify SSL certificate.
2016-11-08 00:54:48,954 INFO      Matching all indices. Ignoring flags other than --exclude.
2016-11-08 00:54:48,955 INFO      Action show will be performed on the following indices: [u'.kibana', u'.operations.2016.11.08', u'.searchguard.logging-es-574e4loc-1-mv9vx']
2016-11-08 00:54:48,955 INFO      Matching indices:
.kibana
.operations.2016.11.08
.searchguard.logging-es-574e4loc-1-mv9vx

# oc get po --all-namespaces
NAMESPACE      NAME                            READY     STATUS             RESTARTS   AGE
default        docker-registry-3-0h7uf         1/1       Running            2          14h
default        registry-console-1-v4xl7        1/1       Running            1          14h
default        router-1-tbw8s                  1/1       Running            2          14h
install-test   cakephp-mysql-example-1-54f4o   1/1       Running            2          14h
install-test   cakephp-mysql-example-1-build   0/1       Completed          0          14h
install-test   mysql-1-yns2k                   1/1       Running            2          14h
xiazhao        logging-curator-1-o8s4q         1/1       Running            0          13m
xiazhao        logging-deployer-as85s          0/1       Completed          0          13m
xiazhao        logging-es-574e4loc-1-mv9vx     1/1       Running            0          13m
xiazhao        logging-fluentd-pbwys           1/1       Running            0          13m
xiazhao        logging-kibana-1-1kn2u          1/2       CrashLoopBackOff   7          13m

# oc get po
NAME                          READY     STATUS             RESTARTS   AGE
logging-curator-1-o8s4q       1/1       Running            0          10m
logging-deployer-as85s        0/1       Completed          0          10m
logging-es-574e4loc-1-mv9vx   1/1       Running            0          10m
logging-fluentd-pbwys         1/1       Running            0          10m
logging-kibana-1-1kn2u        1/2       CrashLoopBackOff   6          10m


# oc rsh logging-fluentd-pbwys
sh-4.2# grep log-driver /etc/sysconfig/docker
OPTIONS=' --selinux-enabled --log-driver=journald'

Comment 5 Xia Zhao 2016-11-08 06:06:29 UTC
Created attachment 1218412 [details]
logging_deployment_steps

Comment 6 Xia Zhao 2016-11-08 06:07:37 UTC
Please note that the problem about kibana pod crash was tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1391803

Comment 7 Noriko Hosoi 2016-11-08 06:31:01 UTC
I could not duplicate the problem.  Not sure yet, but could there be some configuration issue?

$ oc exec logging-curator-1-ylrxx -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel INFO show indices --all-indices
2016-11-08 06:23:17,868 INFO      Job starting: show indices
2016-11-08 06:23:17,868 INFO      Attempting to verify SSL certificate.
2016-11-08 06:23:17,987 INFO      Matching all indices. Ignoring flags other than --exclude.
2016-11-08 06:23:17,987 INFO      Action show will be performed on the following indices: [u'.kibana', u'.kibana.8a335f3ed3af10cd2164cd093b2ac0c6d1f270e1', u'.searchguard.logging-es-rh0bufx4-2-y013x', u'project.logging.c99162d3-a554-11e6-9e7c-0ed816ca5852.2016.11.08', u'project.test.170bfc84-a557-11e6-9e7c-0ed816ca5852.2016.11.08']
2016-11-08 06:23:17,987 INFO      Matching indices:
.kibana
.kibana.8a335f3ed3af10cd2164cd093b2ac0c6d1f270e1
.searchguard.logging-es-rh0bufx4-2-y013x
project.logging.c99162d3-a554-11e6-9e7c-0ed816ca5852.2016.11.08
project.test.170bfc84-a557-11e6-9e7c-0ed816ca5852.2016.11.08

$ oc get po --all-namespaces
NAMESPACE   NAME                              READY     STATUS      RESTARTS   AGE
default     docker-registry-1-v5z7j           1/1       Running     0          4h
default     router-1-6beck                    1/1       Running     0          4h
logging     logging-auth-proxy-1-build        0/1       Completed   0          4h
logging     logging-curator-1-build           0/1       Completed   0          4h
logging     logging-curator-1-ylrxx           1/1       Running     0          4h
logging     logging-curator-ops-1-vx5lb       1/1       Running     0          4h
logging     logging-deployer-9jmyi            0/1       Completed   0          4h
logging     logging-deployment-1-build        0/1       Completed   0          4h
logging     logging-elasticsearch-1-build     0/1       Completed   0          4h
logging     logging-es-ops-xccsuskf-2-f5to9   1/1       Running     0          4h
logging     logging-es-rh0bufx4-2-y013x       1/1       Running     0          4h
logging     logging-fluentd-1-build           0/1       Completed   0          4h
logging     logging-fluentd-iy5pg             1/1       Running     0          4h
logging     logging-kibana-1-build            0/1       Completed   0          4h
logging     logging-kibana-1-x4sh1            2/2       Running     0          4h
logging     logging-kibana-ops-1-jgcag        2/2       Running     0          4h
test        database-1-j0ypy                  1/1       Running     0          4h
test        frontend-1-263bo                  1/1       Running     0          4h
test        frontend-1-bacnc                  1/1       Running     0          4h
test        ruby-sample-build-1-build         0/1       Completed   0          4h

$ oc exec logging-fluentd-iy5pg -- grep log-driver /etc/sysconfig/docker
OPTIONS='--insecure-registry=172.30.0.0/16 --insecure-registry=ci.dev.openshift.redhat.com:5000 --selinux-enabled --log-driver=json-file'

That is, the log-driver type does not matter.

BTW, where can I find easy-aos.py?  Thanks!

Comment 8 Xia Zhao 2016-11-08 08:40:39 UTC
Created attachment 1218454 [details]
fluentd logs when using journald log driver

Comment 9 Xia Zhao 2016-11-08 08:47:56 UTC
(In reply to Noriko Hosoi from comment #7)
> I could not duplicate the problem.  Not sure yet, but could there be some
> configuration issue?
> 
> $ oc exec logging-curator-1-ylrxx -- curator --host logging-es --use_ssl
> --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert
> --client-key /etc/curator/keys/key --loglevel INFO show indices --all-indices
> 2016-11-08 06:23:17,868 INFO      Job starting: show indices
> 2016-11-08 06:23:17,868 INFO      Attempting to verify SSL certificate.
> 2016-11-08 06:23:17,987 INFO      Matching all indices. Ignoring flags other
> than --exclude.
> 2016-11-08 06:23:17,987 INFO      Action show will be performed on the
> following indices: [u'.kibana',
> u'.kibana.8a335f3ed3af10cd2164cd093b2ac0c6d1f270e1',
> u'.searchguard.logging-es-rh0bufx4-2-y013x',
> u'project.logging.c99162d3-a554-11e6-9e7c-0ed816ca5852.2016.11.08',
> u'project.test.170bfc84-a557-11e6-9e7c-0ed816ca5852.2016.11.08']
> 2016-11-08 06:23:17,987 INFO      Matching indices:
> .kibana
> .kibana.8a335f3ed3af10cd2164cd093b2ac0c6d1f270e1
> .searchguard.logging-es-rh0bufx4-2-y013x
> project.logging.c99162d3-a554-11e6-9e7c-0ed816ca5852.2016.11.08
> project.test.170bfc84-a557-11e6-9e7c-0ed816ca5852.2016.11.08
> 
> $ oc get po --all-namespaces
> NAMESPACE   NAME                              READY     STATUS      RESTARTS
> AGE
> default     docker-registry-1-v5z7j           1/1       Running     0       
> 4h
> default     router-1-6beck                    1/1       Running     0       
> 4h
> logging     logging-auth-proxy-1-build        0/1       Completed   0       
> 4h
> logging     logging-curator-1-build           0/1       Completed   0       
> 4h
> logging     logging-curator-1-ylrxx           1/1       Running     0       
> 4h
> logging     logging-curator-ops-1-vx5lb       1/1       Running     0       
> 4h
> logging     logging-deployer-9jmyi            0/1       Completed   0       
> 4h
> logging     logging-deployment-1-build        0/1       Completed   0       
> 4h
> logging     logging-elasticsearch-1-build     0/1       Completed   0       
> 4h
> logging     logging-es-ops-xccsuskf-2-f5to9   1/1       Running     0       
> 4h
> logging     logging-es-rh0bufx4-2-y013x       1/1       Running     0       
> 4h
> logging     logging-fluentd-1-build           0/1       Completed   0       
> 4h
> logging     logging-fluentd-iy5pg             1/1       Running     0       
> 4h
> logging     logging-kibana-1-build            0/1       Completed   0       
> 4h
> logging     logging-kibana-1-x4sh1            2/2       Running     0       
> 4h
> logging     logging-kibana-ops-1-jgcag        2/2       Running     0       
> 4h
> test        database-1-j0ypy                  1/1       Running     0       
> 4h
> test        frontend-1-263bo                  1/1       Running     0       
> 4h
> test        frontend-1-bacnc                  1/1       Running     0       
> 4h
> test        ruby-sample-build-1-build         0/1       Completed   0       
> 4h
> 
> $ oc exec logging-fluentd-iy5pg -- grep log-driver /etc/sysconfig/docker
> OPTIONS='--insecure-registry=172.30.0.0/16
> --insecure-registry=ci.dev.openshift.redhat.com:5000 --selinux-enabled
> --log-driver=json-file'
> 
> That is, the log-driver type does not matter.
> 
> BTW, where can I find easy-aos.py?  Thanks!

It's weird, you can login my env shared in comment #4, and query ES from curator, the container logs did not exist:

[root@host-8-175-206 ~]# oc exec logging-curator-1-o8s4q -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel INFO show indices --all-indices
2016-11-08 02:58:34,837 INFO      Job starting: show indices
2016-11-08 02:58:34,837 INFO      Attempting to verify SSL certificate.
2016-11-08 02:58:34,944 INFO      Matching all indices. Ignoring flags other than --exclude.
2016-11-08 02:58:34,944 INFO      Action show will be performed on the following indices: [u'.kibana', u'.operations.2016.11.08', u'.searchguard.logging-es-574e4loc-1-mv9vx']
2016-11-08 02:58:34,944 INFO      Matching indices:
.kibana
.operations.2016.11.08
.searchguard.logging-es-574e4loc-1-mv9vx

I have my docker log driver set to journald, and my logging-deployer configmap contain this:
use-journal: "true"

The easy-aos.py is a cript used by QE to deploy logging, I can attach it if needed. Just want to reflect the detailed step for my logging deployment.

Comment 10 Xia Zhao 2016-11-08 08:51:01 UTC
(In reply to Noriko Hosoi from comment #7)
> I could not duplicate the problem.  Not sure yet, but could there be some
> configuration issue?
> 
> $ oc exec logging-curator-1-ylrxx -- curator --host logging-es --use_ssl
> --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert
> --client-key /etc/curator/keys/key --loglevel INFO show indices --all-indices
> 2016-11-08 06:23:17,868 INFO      Job starting: show indices
> 2016-11-08 06:23:17,868 INFO      Attempting to verify SSL certificate.
> 2016-11-08 06:23:17,987 INFO      Matching all indices. Ignoring flags other
> than --exclude.
> 2016-11-08 06:23:17,987 INFO      Action show will be performed on the
> following indices: [u'.kibana',
> u'.kibana.8a335f3ed3af10cd2164cd093b2ac0c6d1f270e1',
> u'.searchguard.logging-es-rh0bufx4-2-y013x',
> u'project.logging.c99162d3-a554-11e6-9e7c-0ed816ca5852.2016.11.08',
> u'project.test.170bfc84-a557-11e6-9e7c-0ed816ca5852.2016.11.08']
> 2016-11-08 06:23:17,987 INFO      Matching indices:
> .kibana
> .kibana.8a335f3ed3af10cd2164cd093b2ac0c6d1f270e1
> .searchguard.logging-es-rh0bufx4-2-y013x
> project.logging.c99162d3-a554-11e6-9e7c-0ed816ca5852.2016.11.08
> project.test.170bfc84-a557-11e6-9e7c-0ed816ca5852.2016.11.08
> 
> $ oc get po --all-namespaces
> NAMESPACE   NAME                              READY     STATUS      RESTARTS
> AGE
> default     docker-registry-1-v5z7j           1/1       Running     0       
> 4h
> default     router-1-6beck                    1/1       Running     0       
> 4h
> logging     logging-auth-proxy-1-build        0/1       Completed   0       
> 4h
> logging     logging-curator-1-build           0/1       Completed   0       
> 4h
> logging     logging-curator-1-ylrxx           1/1       Running     0       
> 4h
> logging     logging-curator-ops-1-vx5lb       1/1       Running     0       
> 4h
> logging     logging-deployer-9jmyi            0/1       Completed   0       
> 4h
> logging     logging-deployment-1-build        0/1       Completed   0       
> 4h
> logging     logging-elasticsearch-1-build     0/1       Completed   0       
> 4h
> logging     logging-es-ops-xccsuskf-2-f5to9   1/1       Running     0       
> 4h
> logging     logging-es-rh0bufx4-2-y013x       1/1       Running     0       
> 4h
> logging     logging-fluentd-1-build           0/1       Completed   0       
> 4h
> logging     logging-fluentd-iy5pg             1/1       Running     0       
> 4h
> logging     logging-kibana-1-build            0/1       Completed   0       
> 4h
> logging     logging-kibana-1-x4sh1            2/2       Running     0       
> 4h
> logging     logging-kibana-ops-1-jgcag        2/2       Running     0       
> 4h
> test        database-1-j0ypy                  1/1       Running     0       
> 4h
> test        frontend-1-263bo                  1/1       Running     0       
> 4h
> test        frontend-1-bacnc                  1/1       Running     0       
> 4h
> test        ruby-sample-build-1-build         0/1       Completed   0       
> 4h
> 
> $ oc exec logging-fluentd-iy5pg -- grep log-driver /etc/sysconfig/docker
> OPTIONS='--insecure-registry=172.30.0.0/16
> --insecure-registry=ci.dev.openshift.redhat.com:5000 --selinux-enabled
> --log-driver=json-file'
> 
> That is, the log-driver type does not matter.
> 
> BTW, where can I find easy-aos.py?  Thanks!
Are you testing on origin? Which images are you using? May I login the env to take a look if possible?

Comment 11 Rich Megginson 2016-11-08 14:42:43 UTC
fluentd is crashing:

2016-11-08 09:41:26 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-08 09:41:27 -0500 error_class="NoMethodError" error="undefined method `[]' for nil:NilClass" plugin_id="object:1d8ee8c"
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.7.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:229:in `eval'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.7.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:229:in `eval'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.7.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:229:in `expand_param'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.7.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:125:in `block (2 levels) in write'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.7.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:120:in `each'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.7.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:120:in `block in write'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/plugin/buf_memory.rb:67:in `feed_each'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/plugin/buf_memory.rb:67:in `msgpack_each'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.7.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:116:in `write'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/buffer.rb:354:in `write_chunk'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/buffer.rb:333:in `pop'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/output.rb:338:in `try_flush'
  2016-11-08 09:41:26 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/output.rb:149:in `run'

Investigating

Comment 12 Rich Megginson 2016-11-08 18:02:57 UTC
The crash is because the flatten-hash filter was never removed from the fluent container.  This is now fixed:

12060282 buildContainer (noarch) completed successfully
koji_builds:
  https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=522705
repositories:
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:rhaos-3.4-rhel-7-docker-candidate-20161108124846
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:3.4.0-4
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:3.4.0
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:latest
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.4

I've also fixed this in the upstream deployer: https://github.com/openshift/origin-aggregated-logging/pull/275

Once that merges, we will build a new deployer image.  But that isn't important - if the file is removed from the fluentd image (as above) it won't be used.

The other part of this is to change logging to use the common data model when using the json-file log driver: https://github.com/openshift/origin-aggregated-logging/pull/272

So, use the new fluentd image with the journald log driver as a workaround.

Comment 13 Xia Zhao 2016-11-09 02:57:32 UTC
Hi Rich, 

Thank you for fixing it. Journald log driver is now working for me with the latest fluentd image on brew:

openshift3/logging-fluentd    aeb6c2153ab7
openshift3/logging-deployer    5f4ddb8cf40d
openshift3/logging-elasticsearch    9b9452c0f8c2
openshift3/logging-kibana    7fc9916eea4d
openshift3/logging-auth-proxy    0d8e3202af61
openshift3/logging-curator    9af78fc06248

Tested on 

# openshift version
openshift v3.4.0.23+24b1a58
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0


# docker version
Client:
 Version:         1.12.3
 API version:     1.24
 Package version: docker-common-1.12.3-4.el7.x86_64
 Go version:      go1.6.2
 Git commit:      f320458-redhat
 Built:           Mon Nov  7 10:15:24 2016
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.3
 API version:     1.24
 Package version: docker-common-1.12.3-4.el7.x86_64
 Go version:      go1.6.2
 Git commit:      f320458-redhat
 Built:           Mon Nov  7 10:15:24 2016
 OS/Arch:         linux/amd64


Index of the user projects are visible:

$ oc exec logging-curator-1-ga59n -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel INFO show indices --all-indices
2016-11-08 21:34:08,772 INFO      Job starting: show indices
2016-11-08 21:34:08,772 INFO      Attempting to verify SSL certificate.
2016-11-08 21:34:08,880 INFO      Matching all indices. Ignoring flags other than --exclude.
2016-11-08 21:34:08,880 INFO      Action show will be performed on the following indices: [u'.kibana',
u'.operations.2016.11.09', u'.searchguard.logging-es-h6a5k1eh-1-5yceg',
u'project.install-test.627931ab-a619-11e6-912c-fa163e4c9831.2016.11.09',
u'project.xiazhao.19f21915-a623-11e6-b108-fa163e4c9831.2016.11.09']
2016-11-08 21:34:08,880 INFO      Matching indices:
.kibana
.operations.2016.11.09
.searchguard.logging-es-h6a5k1eh-1-5yceg
project.install-test.627931ab-a619-11e6-912c-fa163e4c9831.2016.11.09
project.xiazhao.19f21915-a623-11e6-b108-fa163e4c9831.2016.11.09

Comment 14 Xia Zhao 2016-11-09 03:43:40 UTC
And I keep the keyword TestBlocker here since the issue for json-file log driver is blocking logging functions which are not fit for journal log driver, e.g. ES copy.

Comment 15 Rich Megginson 2016-11-09 03:48:15 UTC
new deployer image:

repositories = brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:rhaos-3.4-rhel-7-docker-candidate-20161108223639, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:3.4.0-21, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:3.4.0, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:latest, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:v3.4 

and new fluentd image:

repositories = brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:rhaos-3.4-rhel-7-docker-candidate-20161108223331, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:3.4.0-5, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:3.4.0, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:latest, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.4

Comment 16 Peng Li 2016-11-09 07:55:28 UTC
verified 

[root@openshift-125 ~]# oc exec logging-curator-1-6aqsc -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel INFO show indices --all-indices
2016-11-09 02:52:38,770 INFO      Job starting: show indices
2016-11-09 02:52:38,770 INFO      Attempting to verify SSL certificate.
2016-11-09 02:52:38,968 INFO      Matching all indices. Ignoring flags other than --exclude.
2016-11-09 02:52:38,969 INFO      Action show will be performed on the following indices: [u'.kibana', u'.operations.2016.11.09', u'.searchguard.logging-es-84dwrk7b-1-xts36', u'project.install-test.c5c8c565-a629-11e6-9cd0-fa163edac221.2016.11.09', u'project.logging.ed8fc566-a61c-11e6-9cd0-fa163edac221.2016.11.09']
2016-11-09 02:52:38,969 INFO      Matching indices:
.kibana
.operations.2016.11.09
.searchguard.logging-es-84dwrk7b-1-xts36
project.install-test.c5c8c565-a629-11e6-9cd0-fa163edac221.2016.11.09
project.logging.ed8fc566-a61c-11e6-9cd0-fa163edac221.2016.11.09



[root@openshift-148 ~]# grep log /etc/sysconfig/docker
OPTIONS=' --selinux-enabled --log-driver=json-file --log-opt max-size=50m'

Deploy logging with 'use-journal=false'

Version:
[root@openshift-148 ~]# docker images | grep logging
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer        3.4.0               c364ab9c2f75        4 hours ago         762.3 MB
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd         3.4.0               125ed17f96bc        4 hours ago         231.7 MB

Comment 18 Xia Zhao 2016-11-10 02:24:06 UTC
Created attachment 1219132 [details]
easy-aos.py

Comment 20 errata-xmlrpc 2017-01-18 12:49:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.