Bug 1591452

Summary: fluentd fails with unexpected error error="Connection refused - connect(2)"
Product: OpenShift Container Platform Reporter: Steven Walter <stwalter>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Meng Bo <bmeng>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: aos-bugs, pasik, rmeggins
Target Milestone: ---Keywords: Reopened
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-02 18:46:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steven Walter 2018-06-14 18:30:05 UTC
Description of problem:
Logging attempted to be installed, but fluentd pods fail to start due to connection refused. However it does not explain what it is trying to connect to, and manual debug connection attempts work.

Version-Release number of selected component (if applicable):
registry.access.redhat.com/openshift3/logging-fluentd:v3.9.30

How reproducible:
Unconfirmed in test env


atomic install --system \
  --storage=ostree \
  --set INVENTORY_FILE=/home/cloud-user/openshift/hosts \
  --set PLAYBOOK_FILE=/usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml \
  --set OPTS="-v" \
  registry.access.redhat.com/openshift3/ose-ansible:v3.9

LOGS FROM logging-fluentd-* 

umounts of dead containers will fail. Ignoring...
umount: /var/lib/docker/containers/16ab49d58ba6124122c4f88821865e5f441a38e4b18dba64dff498b5d6bb6306/shm: not mounted
umount: /var/lib/docker/containers/2bf9d91728f60b0554d0f17f00c03d6b5f467e4c9713f60a5131412f75b28775/shm: not mounted
umount: /var/lib/docker/containers/3f68c6647495ea44970612f5cc26f927daa2a3fbcebd1f728b3cceb447454f41/shm: not mounted
umount: /var/lib/docker/containers/484f42ab3033d062c4dd46da8188392eb42db71b473be82b729bd79179fadd58/shm: not mounted
umount: /var/lib/docker/containers/6fb9739957b02f341d4c7e19a86d377beeb053ae609830482f0b5295345e9a30/shm: not mounted
umount: /var/lib/docker/containers/7a2b21769572165fea78bc1a6eeedb5fb2925c395c1038692a65988065c5a73c/shm: not mounted
umount: /var/lib/docker/containers/86d9a4f499fe9452ed469273c7e1720f943ba32a6a61522e74ded61296c73745/shm: not mounted
umount: /var/lib/docker/containers/c63999ef3fa5a974dace782696edb40e78f59ce8ff2a1845869fed991356fd08/shm: not mounted
umount: /var/lib/docker/containers/e8c9cd0ca4114504a2c5471da8e00c1f686104f9a0c5147b9fb51ec2fce86839/shm: not mounted
umount: /var/lib/docker/containers/fa7070323fc6f10519e78710c57802512e219cc10c76ee0d68e901230241a175/shm: not mounted
[DEPRECATION WARNING]: 'merge_json_log' for the fluent-plugin-kubernetes_metadata_filter will no longer be supported in future releases
2018-06-13 10:07:53 -0700 [info]: reading config file path="/etc/fluent/fluent.conf"
2018-06-13 10:07:54 -0700 [error]: unexpected error error="Connection refused - connect(2)"
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:878:in `initialize'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:878:in `open'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/timeout.rb:52:in `timeout'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:877:in `connect'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:862:in `do_start'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:851:in `start'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/resource.rb:51:in `get'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:328:in `block in api'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:58:in `handle_exception'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:327:in `api'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.3/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:62:in `block in configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `each'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `block in configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `each'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:129:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:103:in `run_configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:489:in `run_configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:174:in `block in start'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `call'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `main_process'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:170:in `start'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/command/fluentd.rb:173:in `<top (required)>'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/bin/fluentd:8:in `<top (required)>'
  2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `load'
  2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `<main>'

Comment 1 Steven Walter 2018-06-14 18:34:23 UTC
Using "oc debug" we were able to get some goof info from the fluentd pod context:

# env | sort
BUFFER_QUEUE_LIMIT=32
BUFFER_SIZE_LIMIT=8m
DATA_VERSION=1.6.0
ES_CA=/etc/fluent/keys/ca
ES_CLIENT_CERT=/etc/fluent/keys/cert
ES_CLIENT_KEY=/etc/fluent/keys/key
ES_HOST=logging-es
ES_PORT=9200
FILE_BUFFER_LIMIT=256Mi
FLUENTD_AUDIT_LOG_PARSER_VERSION=0.0.5
FLUENTD_CPU_LIMIT=4
FLUENTD_ES=1.13.0-1
FLUENTD_KUBE_METADATA=1.0.1-1
FLUENTD_MEMORY_LIMIT=536870912
FLUENTD_RECORD_MODIFIER=0.6.1
FLUENTD_REWRITE_TAG=1.5.6-1
FLUENTD_SECURE_FWD=0.4.5-2
FLUENTD_SYSTEMD=0.0.9-1
FLUENTD_VERSION=0.12.42
FLUENTD_VIAQ_DATA_MODEL=0.0.13
GEM_HOME=/opt/app-root/src
HOME=/opt/app-root/src
HOSTNAME=logging-fluentd-d9dld-debug
JOURNAL_READ_FROM_HEAD=
JOURNAL_SOURCE=
K8S_HOST_URL=https://kubernetes.default.svc.cluster.local
KUBERNETES_PORT=tcp://172.30.0.1:443
KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443
KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_53_TCP=tcp://172.30.0.1:53
KUBERNETES_PORT_53_TCP_ADDR=172.30.0.1
KUBERNETES_PORT_53_TCP_PORT=53
KUBERNETES_PORT_53_TCP_PROTO=tcp
KUBERNETES_PORT_53_UDP=udp://172.30.0.1:53
KUBERNETES_PORT_53_UDP_ADDR=172.30.0.1
KUBERNETES_PORT_53_UDP_PORT=53
KUBERNETES_PORT_53_UDP_PROTO=udp
KUBERNETES_SERVICE_HOST=172.30.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_DNS=53
KUBERNETES_SERVICE_PORT_DNS_TCP=53
KUBERNETES_SERVICE_PORT_HTTPS=443
LOGGING_ES_CLUSTER_PORT=tcp://172.30.222.181:9300
LOGGING_ES_CLUSTER_PORT_9300_TCP=tcp://172.30.222.181:9300
LOGGING_ES_CLUSTER_PORT_9300_TCP_ADDR=172.30.222.181
LOGGING_ES_CLUSTER_PORT_9300_TCP_PORT=9300
LOGGING_ES_CLUSTER_PORT_9300_TCP_PROTO=tcp
LOGGING_ES_CLUSTER_SERVICE_HOST=172.30.222.181
LOGGING_ES_CLUSTER_SERVICE_PORT=9300
LOGGING_ES_PORT=tcp://172.30.74.55:9200
LOGGING_ES_PORT_9200_TCP=tcp://172.30.74.55:9200
LOGGING_ES_PORT_9200_TCP_ADDR=172.30.74.55
LOGGING_ES_PORT_9200_TCP_PORT=9200
LOGGING_ES_PORT_9200_TCP_PROTO=tcp
LOGGING_ES_PROMETHEUS_PORT=tcp://172.30.170.82:443
LOGGING_ES_PROMETHEUS_PORT_443_TCP=tcp://172.30.170.82:443
LOGGING_ES_PROMETHEUS_PORT_443_TCP_ADDR=172.30.170.82
LOGGING_ES_PROMETHEUS_PORT_443_TCP_PORT=443
LOGGING_ES_PROMETHEUS_PORT_443_TCP_PROTO=tcp
LOGGING_ES_PROMETHEUS_SERVICE_HOST=172.30.170.82
LOGGING_ES_PROMETHEUS_SERVICE_PORT=443
LOGGING_ES_PROMETHEUS_SERVICE_PORT_PROXY=443
LOGGING_ES_SERVICE_HOST=172.30.74.55
LOGGING_ES_SERVICE_PORT=9200
LOGGING_KIBANA_PORT=tcp://172.30.60.146:443
LOGGING_KIBANA_PORT_443_TCP=tcp://172.30.60.146:443
LOGGING_KIBANA_PORT_443_TCP_ADDR=172.30.60.146
LOGGING_KIBANA_PORT_443_TCP_PORT=443
LOGGING_KIBANA_PORT_443_TCP_PROTO=tcp
LOGGING_KIBANA_SERVICE_HOST=172.30.60.146
LOGGING_KIBANA_SERVICE_PORT=443
OCP_OPERATIONS_PROJECTS=default openshift openshift-
OPS_CA=/etc/fluent/keys/ops-ca
OPS_CLIENT_CERT=/etc/fluent/keys/ops-cert
OPS_CLIENT_KEY=/etc/fluent/keys/ops-key
OPS_HOST=logging-es
OPS_PORT=9200
PATH=/opt/app-root/src/bin:/opt/app-root/bin:/usr/libexec/fluentd/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/opt/app-root/src
RUBY_VERSION=2.0
SHLVL=1
TERM=xterm
_=/usr/bin/env
container=oci
sh-4.2#
sh-4.2#
sh-4.2# curl -kv $ES_HOST:9200
* About to connect() to logging-es port 9200 (#0)
*   Trying 172.30.74.55...
* Connected to logging-es (172.30.74.55) port 9200 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: logging-es:9200
> Accept: */*
>
* Empty reply from server
* Connection #0 to host logging-es left intact
curl: (52) Empty reply from server
sh-4.2#
sh-4.2#
sh-4.2# curl -kv logging-es:9200
* About to connect() to logging-es port 9200 (#0)
*   Trying 172.30.74.55...
* Connected to logging-es (172.30.74.55) port 9200 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: logging-es:9200
> Accept: */*
>
* Empty reply from server
* Connection #0 to host logging-es left intact
curl: (52) Empty reply from server
sh-4.2#

Comment 3 Jeff Cantrill 2018-06-14 18:41:42 UTC

*** This bug has been marked as a duplicate of bug 1560170 ***

Comment 4 Steven Walter 2018-06-15 17:53:01 UTC
Re-opening. Customer has latest image and still seeing issue so I think this might be something new.

registry.access.redhat.com/openshift3/logging-fluentd         latest              3f822960402b        2 weeks ago         286 MB
registry.access.redhat.com/openshift3/logging-fluentd         v3.9.30             3f822960402b        2 weeks ago         286 MB

Uploading fluentd log with trace level

Comment 6 Rich Megginson 2018-06-15 18:01:05 UTC
Please provide logging-dump.sh - this script will also test the inter-pod connectivity to rule out problems with e.g. ovs or other network virtualization.

Comment 7 Steven Walter 2018-06-15 20:58:39 UTC
This was attached at  attachment 1451554 [details] -- though the info you're looking for might be missing since the fluentd pods werent running. Anything you need us to run in an "oc debug" context?

# oc debug logging-fluentd-<podname>

^Allows us to spin up a pod without running the CMD so we can test things from the perspective of the fluentd pod

Comment 8 Rich Megginson 2018-06-15 23:01:51 UTC
It isn't the connectivity to elasticsearch that is the issue - it is the connectivity to kubernetes:

  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.3/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter'
...
  2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `<main>'

This means the kubeclient code, not the elasticsearch rest code.

Try this using oc debug logging-fluentd-<podname>:

curl -v https://kubernetes.default.svc.cluster.local --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt