Description of problem: Logging attempted to be installed, but fluentd pods fail to start due to connection refused. However it does not explain what it is trying to connect to, and manual debug connection attempts work. Version-Release number of selected component (if applicable): registry.access.redhat.com/openshift3/logging-fluentd:v3.9.30 How reproducible: Unconfirmed in test env atomic install --system \ --storage=ostree \ --set INVENTORY_FILE=/home/cloud-user/openshift/hosts \ --set PLAYBOOK_FILE=/usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml \ --set OPTS="-v" \ registry.access.redhat.com/openshift3/ose-ansible:v3.9 LOGS FROM logging-fluentd-* umounts of dead containers will fail. Ignoring... umount: /var/lib/docker/containers/16ab49d58ba6124122c4f88821865e5f441a38e4b18dba64dff498b5d6bb6306/shm: not mounted umount: /var/lib/docker/containers/2bf9d91728f60b0554d0f17f00c03d6b5f467e4c9713f60a5131412f75b28775/shm: not mounted umount: /var/lib/docker/containers/3f68c6647495ea44970612f5cc26f927daa2a3fbcebd1f728b3cceb447454f41/shm: not mounted umount: /var/lib/docker/containers/484f42ab3033d062c4dd46da8188392eb42db71b473be82b729bd79179fadd58/shm: not mounted umount: /var/lib/docker/containers/6fb9739957b02f341d4c7e19a86d377beeb053ae609830482f0b5295345e9a30/shm: not mounted umount: /var/lib/docker/containers/7a2b21769572165fea78bc1a6eeedb5fb2925c395c1038692a65988065c5a73c/shm: not mounted umount: /var/lib/docker/containers/86d9a4f499fe9452ed469273c7e1720f943ba32a6a61522e74ded61296c73745/shm: not mounted umount: /var/lib/docker/containers/c63999ef3fa5a974dace782696edb40e78f59ce8ff2a1845869fed991356fd08/shm: not mounted umount: /var/lib/docker/containers/e8c9cd0ca4114504a2c5471da8e00c1f686104f9a0c5147b9fb51ec2fce86839/shm: not mounted umount: /var/lib/docker/containers/fa7070323fc6f10519e78710c57802512e219cc10c76ee0d68e901230241a175/shm: not mounted [DEPRECATION WARNING]: 'merge_json_log' for the fluent-plugin-kubernetes_metadata_filter will no longer be supported in future releases 2018-06-13 10:07:53 -0700 [info]: reading config file path="/etc/fluent/fluent.conf" 2018-06-13 10:07:54 -0700 [error]: unexpected error error="Connection refused - connect(2)" 2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:878:in `initialize' 2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:878:in `open' 2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect' 2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/timeout.rb:52:in `timeout' 2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:877:in `connect' 2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:862:in `do_start' 2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:851:in `start' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/resource.rb:51:in `get' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:328:in `block in api' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:58:in `handle_exception' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:327:in `api' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.3/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:62:in `block in configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `each' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `block in configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `each' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:129:in `configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:103:in `run_configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:489:in `run_configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:174:in `block in start' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `call' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `main_process' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:170:in `start' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/command/fluentd.rb:173:in `<top (required)>' 2018-06-13 10:07:54 -0700 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require' 2018-06-13 10:07:54 -0700 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/bin/fluentd:8:in `<top (required)>' 2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `load' 2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `<main>'
Using "oc debug" we were able to get some goof info from the fluentd pod context: # env | sort BUFFER_QUEUE_LIMIT=32 BUFFER_SIZE_LIMIT=8m DATA_VERSION=1.6.0 ES_CA=/etc/fluent/keys/ca ES_CLIENT_CERT=/etc/fluent/keys/cert ES_CLIENT_KEY=/etc/fluent/keys/key ES_HOST=logging-es ES_PORT=9200 FILE_BUFFER_LIMIT=256Mi FLUENTD_AUDIT_LOG_PARSER_VERSION=0.0.5 FLUENTD_CPU_LIMIT=4 FLUENTD_ES=1.13.0-1 FLUENTD_KUBE_METADATA=1.0.1-1 FLUENTD_MEMORY_LIMIT=536870912 FLUENTD_RECORD_MODIFIER=0.6.1 FLUENTD_REWRITE_TAG=1.5.6-1 FLUENTD_SECURE_FWD=0.4.5-2 FLUENTD_SYSTEMD=0.0.9-1 FLUENTD_VERSION=0.12.42 FLUENTD_VIAQ_DATA_MODEL=0.0.13 GEM_HOME=/opt/app-root/src HOME=/opt/app-root/src HOSTNAME=logging-fluentd-d9dld-debug JOURNAL_READ_FROM_HEAD= JOURNAL_SOURCE= K8S_HOST_URL=https://kubernetes.default.svc.cluster.local KUBERNETES_PORT=tcp://172.30.0.1:443 KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443 KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1 KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_53_TCP=tcp://172.30.0.1:53 KUBERNETES_PORT_53_TCP_ADDR=172.30.0.1 KUBERNETES_PORT_53_TCP_PORT=53 KUBERNETES_PORT_53_TCP_PROTO=tcp KUBERNETES_PORT_53_UDP=udp://172.30.0.1:53 KUBERNETES_PORT_53_UDP_ADDR=172.30.0.1 KUBERNETES_PORT_53_UDP_PORT=53 KUBERNETES_PORT_53_UDP_PROTO=udp KUBERNETES_SERVICE_HOST=172.30.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_DNS=53 KUBERNETES_SERVICE_PORT_DNS_TCP=53 KUBERNETES_SERVICE_PORT_HTTPS=443 LOGGING_ES_CLUSTER_PORT=tcp://172.30.222.181:9300 LOGGING_ES_CLUSTER_PORT_9300_TCP=tcp://172.30.222.181:9300 LOGGING_ES_CLUSTER_PORT_9300_TCP_ADDR=172.30.222.181 LOGGING_ES_CLUSTER_PORT_9300_TCP_PORT=9300 LOGGING_ES_CLUSTER_PORT_9300_TCP_PROTO=tcp LOGGING_ES_CLUSTER_SERVICE_HOST=172.30.222.181 LOGGING_ES_CLUSTER_SERVICE_PORT=9300 LOGGING_ES_PORT=tcp://172.30.74.55:9200 LOGGING_ES_PORT_9200_TCP=tcp://172.30.74.55:9200 LOGGING_ES_PORT_9200_TCP_ADDR=172.30.74.55 LOGGING_ES_PORT_9200_TCP_PORT=9200 LOGGING_ES_PORT_9200_TCP_PROTO=tcp LOGGING_ES_PROMETHEUS_PORT=tcp://172.30.170.82:443 LOGGING_ES_PROMETHEUS_PORT_443_TCP=tcp://172.30.170.82:443 LOGGING_ES_PROMETHEUS_PORT_443_TCP_ADDR=172.30.170.82 LOGGING_ES_PROMETHEUS_PORT_443_TCP_PORT=443 LOGGING_ES_PROMETHEUS_PORT_443_TCP_PROTO=tcp LOGGING_ES_PROMETHEUS_SERVICE_HOST=172.30.170.82 LOGGING_ES_PROMETHEUS_SERVICE_PORT=443 LOGGING_ES_PROMETHEUS_SERVICE_PORT_PROXY=443 LOGGING_ES_SERVICE_HOST=172.30.74.55 LOGGING_ES_SERVICE_PORT=9200 LOGGING_KIBANA_PORT=tcp://172.30.60.146:443 LOGGING_KIBANA_PORT_443_TCP=tcp://172.30.60.146:443 LOGGING_KIBANA_PORT_443_TCP_ADDR=172.30.60.146 LOGGING_KIBANA_PORT_443_TCP_PORT=443 LOGGING_KIBANA_PORT_443_TCP_PROTO=tcp LOGGING_KIBANA_SERVICE_HOST=172.30.60.146 LOGGING_KIBANA_SERVICE_PORT=443 OCP_OPERATIONS_PROJECTS=default openshift openshift- OPS_CA=/etc/fluent/keys/ops-ca OPS_CLIENT_CERT=/etc/fluent/keys/ops-cert OPS_CLIENT_KEY=/etc/fluent/keys/ops-key OPS_HOST=logging-es OPS_PORT=9200 PATH=/opt/app-root/src/bin:/opt/app-root/bin:/usr/libexec/fluentd/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/opt/app-root/src RUBY_VERSION=2.0 SHLVL=1 TERM=xterm _=/usr/bin/env container=oci sh-4.2# sh-4.2# sh-4.2# curl -kv $ES_HOST:9200 * About to connect() to logging-es port 9200 (#0) * Trying 172.30.74.55... * Connected to logging-es (172.30.74.55) port 9200 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: logging-es:9200 > Accept: */* > * Empty reply from server * Connection #0 to host logging-es left intact curl: (52) Empty reply from server sh-4.2# sh-4.2# sh-4.2# curl -kv logging-es:9200 * About to connect() to logging-es port 9200 (#0) * Trying 172.30.74.55... * Connected to logging-es (172.30.74.55) port 9200 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: logging-es:9200 > Accept: */* > * Empty reply from server * Connection #0 to host logging-es left intact curl: (52) Empty reply from server sh-4.2#
*** This bug has been marked as a duplicate of bug 1560170 ***
Re-opening. Customer has latest image and still seeing issue so I think this might be something new. registry.access.redhat.com/openshift3/logging-fluentd latest 3f822960402b 2 weeks ago 286 MB registry.access.redhat.com/openshift3/logging-fluentd v3.9.30 3f822960402b 2 weeks ago 286 MB Uploading fluentd log with trace level
Please provide logging-dump.sh - this script will also test the inter-pod connectivity to rule out problems with e.g. ovs or other network virtualization.
This was attached at attachment 1451554 [details] -- though the info you're looking for might be missing since the fluentd pods werent running. Anything you need us to run in an "oc debug" context? # oc debug logging-fluentd-<podname> ^Allows us to spin up a pod without running the CMD so we can test things from the perspective of the fluentd pod
It isn't the connectivity to elasticsearch that is the issue - it is the connectivity to kubernetes: 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.3/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter' ... 2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `<main>' This means the kubeclient code, not the elasticsearch rest code. Try this using oc debug logging-fluentd-<podname>: curl -v https://kubernetes.default.svc.cluster.local --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt