Bug 1591452 - fluentd fails with unexpected error error="Connection refused - connect(2)"
Summary: fluentd fails with unexpected error error="Connection refused - connect(2)"
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.9.z
Assignee: Casey Callendrello
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-14 18:30 UTC by Steven Walter
Modified: 2018-11-02 18:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-02 18:46:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Steven Walter 2018-06-14 18:30:05 UTC
Description of problem:
Logging attempted to be installed, but fluentd pods fail to start due to connection refused. However it does not explain what it is trying to connect to, and manual debug connection attempts work.

Version-Release number of selected component (if applicable):
registry.access.redhat.com/openshift3/logging-fluentd:v3.9.30

How reproducible:
Unconfirmed in test env


atomic install --system \
  --storage=ostree \
  --set INVENTORY_FILE=/home/cloud-user/openshift/hosts \
  --set PLAYBOOK_FILE=/usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml \
  --set OPTS="-v" \
  registry.access.redhat.com/openshift3/ose-ansible:v3.9

LOGS FROM logging-fluentd-* 

umounts of dead containers will fail. Ignoring...
umount: /var/lib/docker/containers/16ab49d58ba6124122c4f88821865e5f441a38e4b18dba64dff498b5d6bb6306/shm: not mounted
umount: /var/lib/docker/containers/2bf9d91728f60b0554d0f17f00c03d6b5f467e4c9713f60a5131412f75b28775/shm: not mounted
umount: /var/lib/docker/containers/3f68c6647495ea44970612f5cc26f927daa2a3fbcebd1f728b3cceb447454f41/shm: not mounted
umount: /var/lib/docker/containers/484f42ab3033d062c4dd46da8188392eb42db71b473be82b729bd79179fadd58/shm: not mounted
umount: /var/lib/docker/containers/6fb9739957b02f341d4c7e19a86d377beeb053ae609830482f0b5295345e9a30/shm: not mounted
umount: /var/lib/docker/containers/7a2b21769572165fea78bc1a6eeedb5fb2925c395c1038692a65988065c5a73c/shm: not mounted
umount: /var/lib/docker/containers/86d9a4f499fe9452ed469273c7e1720f943ba32a6a61522e74ded61296c73745/shm: not mounted
umount: /var/lib/docker/containers/c63999ef3fa5a974dace782696edb40e78f59ce8ff2a1845869fed991356fd08/shm: not mounted
umount: /var/lib/docker/containers/e8c9cd0ca4114504a2c5471da8e00c1f686104f9a0c5147b9fb51ec2fce86839/shm: not mounted
umount: /var/lib/docker/containers/fa7070323fc6f10519e78710c57802512e219cc10c76ee0d68e901230241a175/shm: not mounted
[DEPRECATION WARNING]: 'merge_json_log' for the fluent-plugin-kubernetes_metadata_filter will no longer be supported in future releases
2018-06-13 10:07:53 -0700 [info]: reading config file path="/etc/fluent/fluent.conf"
2018-06-13 10:07:54 -0700 [error]: unexpected error error="Connection refused - connect(2)"
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:878:in `initialize'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:878:in `open'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/timeout.rb:52:in `timeout'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:877:in `connect'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:862:in `do_start'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/ruby/net/http.rb:851:in `start'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/resource.rb:51:in `get'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:328:in `block in api'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:58:in `handle_exception'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:327:in `api'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.3/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:62:in `block in configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `each'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `block in configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `each'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:129:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:103:in `run_configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:489:in `run_configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:174:in `block in start'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `call'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `main_process'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:170:in `start'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/command/fluentd.rb:173:in `<top (required)>'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/bin/fluentd:8:in `<top (required)>'
  2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `load'
  2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `<main>'

Comment 1 Steven Walter 2018-06-14 18:34:23 UTC
Using "oc debug" we were able to get some goof info from the fluentd pod context:

# env | sort
BUFFER_QUEUE_LIMIT=32
BUFFER_SIZE_LIMIT=8m
DATA_VERSION=1.6.0
ES_CA=/etc/fluent/keys/ca
ES_CLIENT_CERT=/etc/fluent/keys/cert
ES_CLIENT_KEY=/etc/fluent/keys/key
ES_HOST=logging-es
ES_PORT=9200
FILE_BUFFER_LIMIT=256Mi
FLUENTD_AUDIT_LOG_PARSER_VERSION=0.0.5
FLUENTD_CPU_LIMIT=4
FLUENTD_ES=1.13.0-1
FLUENTD_KUBE_METADATA=1.0.1-1
FLUENTD_MEMORY_LIMIT=536870912
FLUENTD_RECORD_MODIFIER=0.6.1
FLUENTD_REWRITE_TAG=1.5.6-1
FLUENTD_SECURE_FWD=0.4.5-2
FLUENTD_SYSTEMD=0.0.9-1
FLUENTD_VERSION=0.12.42
FLUENTD_VIAQ_DATA_MODEL=0.0.13
GEM_HOME=/opt/app-root/src
HOME=/opt/app-root/src
HOSTNAME=logging-fluentd-d9dld-debug
JOURNAL_READ_FROM_HEAD=
JOURNAL_SOURCE=
K8S_HOST_URL=https://kubernetes.default.svc.cluster.local
KUBERNETES_PORT=tcp://172.30.0.1:443
KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443
KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_53_TCP=tcp://172.30.0.1:53
KUBERNETES_PORT_53_TCP_ADDR=172.30.0.1
KUBERNETES_PORT_53_TCP_PORT=53
KUBERNETES_PORT_53_TCP_PROTO=tcp
KUBERNETES_PORT_53_UDP=udp://172.30.0.1:53
KUBERNETES_PORT_53_UDP_ADDR=172.30.0.1
KUBERNETES_PORT_53_UDP_PORT=53
KUBERNETES_PORT_53_UDP_PROTO=udp
KUBERNETES_SERVICE_HOST=172.30.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_DNS=53
KUBERNETES_SERVICE_PORT_DNS_TCP=53
KUBERNETES_SERVICE_PORT_HTTPS=443
LOGGING_ES_CLUSTER_PORT=tcp://172.30.222.181:9300
LOGGING_ES_CLUSTER_PORT_9300_TCP=tcp://172.30.222.181:9300
LOGGING_ES_CLUSTER_PORT_9300_TCP_ADDR=172.30.222.181
LOGGING_ES_CLUSTER_PORT_9300_TCP_PORT=9300
LOGGING_ES_CLUSTER_PORT_9300_TCP_PROTO=tcp
LOGGING_ES_CLUSTER_SERVICE_HOST=172.30.222.181
LOGGING_ES_CLUSTER_SERVICE_PORT=9300
LOGGING_ES_PORT=tcp://172.30.74.55:9200
LOGGING_ES_PORT_9200_TCP=tcp://172.30.74.55:9200
LOGGING_ES_PORT_9200_TCP_ADDR=172.30.74.55
LOGGING_ES_PORT_9200_TCP_PORT=9200
LOGGING_ES_PORT_9200_TCP_PROTO=tcp
LOGGING_ES_PROMETHEUS_PORT=tcp://172.30.170.82:443
LOGGING_ES_PROMETHEUS_PORT_443_TCP=tcp://172.30.170.82:443
LOGGING_ES_PROMETHEUS_PORT_443_TCP_ADDR=172.30.170.82
LOGGING_ES_PROMETHEUS_PORT_443_TCP_PORT=443
LOGGING_ES_PROMETHEUS_PORT_443_TCP_PROTO=tcp
LOGGING_ES_PROMETHEUS_SERVICE_HOST=172.30.170.82
LOGGING_ES_PROMETHEUS_SERVICE_PORT=443
LOGGING_ES_PROMETHEUS_SERVICE_PORT_PROXY=443
LOGGING_ES_SERVICE_HOST=172.30.74.55
LOGGING_ES_SERVICE_PORT=9200
LOGGING_KIBANA_PORT=tcp://172.30.60.146:443
LOGGING_KIBANA_PORT_443_TCP=tcp://172.30.60.146:443
LOGGING_KIBANA_PORT_443_TCP_ADDR=172.30.60.146
LOGGING_KIBANA_PORT_443_TCP_PORT=443
LOGGING_KIBANA_PORT_443_TCP_PROTO=tcp
LOGGING_KIBANA_SERVICE_HOST=172.30.60.146
LOGGING_KIBANA_SERVICE_PORT=443
OCP_OPERATIONS_PROJECTS=default openshift openshift-
OPS_CA=/etc/fluent/keys/ops-ca
OPS_CLIENT_CERT=/etc/fluent/keys/ops-cert
OPS_CLIENT_KEY=/etc/fluent/keys/ops-key
OPS_HOST=logging-es
OPS_PORT=9200
PATH=/opt/app-root/src/bin:/opt/app-root/bin:/usr/libexec/fluentd/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/opt/app-root/src
RUBY_VERSION=2.0
SHLVL=1
TERM=xterm
_=/usr/bin/env
container=oci
sh-4.2#
sh-4.2#
sh-4.2# curl -kv $ES_HOST:9200
* About to connect() to logging-es port 9200 (#0)
*   Trying 172.30.74.55...
* Connected to logging-es (172.30.74.55) port 9200 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: logging-es:9200
> Accept: */*
>
* Empty reply from server
* Connection #0 to host logging-es left intact
curl: (52) Empty reply from server
sh-4.2#
sh-4.2#
sh-4.2# curl -kv logging-es:9200
* About to connect() to logging-es port 9200 (#0)
*   Trying 172.30.74.55...
* Connected to logging-es (172.30.74.55) port 9200 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: logging-es:9200
> Accept: */*
>
* Empty reply from server
* Connection #0 to host logging-es left intact
curl: (52) Empty reply from server
sh-4.2#

Comment 3 Jeff Cantrill 2018-06-14 18:41:42 UTC

*** This bug has been marked as a duplicate of bug 1560170 ***

Comment 4 Steven Walter 2018-06-15 17:53:01 UTC
Re-opening. Customer has latest image and still seeing issue so I think this might be something new.

registry.access.redhat.com/openshift3/logging-fluentd         latest              3f822960402b        2 weeks ago         286 MB
registry.access.redhat.com/openshift3/logging-fluentd         v3.9.30             3f822960402b        2 weeks ago         286 MB

Uploading fluentd log with trace level

Comment 6 Rich Megginson 2018-06-15 18:01:05 UTC
Please provide logging-dump.sh - this script will also test the inter-pod connectivity to rule out problems with e.g. ovs or other network virtualization.

Comment 7 Steven Walter 2018-06-15 20:58:39 UTC
This was attached at  attachment 1451554 [details] -- though the info you're looking for might be missing since the fluentd pods werent running. Anything you need us to run in an "oc debug" context?

# oc debug logging-fluentd-<podname>

^Allows us to spin up a pod without running the CMD so we can test things from the perspective of the fluentd pod

Comment 8 Rich Megginson 2018-06-15 23:01:51 UTC
It isn't the connectivity to elasticsearch that is the issue - it is the connectivity to kubernetes:

  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.3/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure'
  2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter'
...
  2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `<main>'

This means the kubeclient code, not the elasticsearch rest code.

Try this using oc debug logging-fluentd-<podname>:

curl -v https://kubernetes.default.svc.cluster.local --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt


Note You need to log in before you can comment on or make changes to this bug.