Bug 1591452
Summary: | fluentd fails with unexpected error error="Connection refused - connect(2)" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Steven Walter <stwalter> |
Component: | Networking | Assignee: | Casey Callendrello <cdc> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Meng Bo <bmeng> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9.0 | CC: | aos-bugs, pasik, rmeggins |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 3.9.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-02 18:46:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Steven Walter
2018-06-14 18:30:05 UTC
Using "oc debug" we were able to get some goof info from the fluentd pod context: # env | sort BUFFER_QUEUE_LIMIT=32 BUFFER_SIZE_LIMIT=8m DATA_VERSION=1.6.0 ES_CA=/etc/fluent/keys/ca ES_CLIENT_CERT=/etc/fluent/keys/cert ES_CLIENT_KEY=/etc/fluent/keys/key ES_HOST=logging-es ES_PORT=9200 FILE_BUFFER_LIMIT=256Mi FLUENTD_AUDIT_LOG_PARSER_VERSION=0.0.5 FLUENTD_CPU_LIMIT=4 FLUENTD_ES=1.13.0-1 FLUENTD_KUBE_METADATA=1.0.1-1 FLUENTD_MEMORY_LIMIT=536870912 FLUENTD_RECORD_MODIFIER=0.6.1 FLUENTD_REWRITE_TAG=1.5.6-1 FLUENTD_SECURE_FWD=0.4.5-2 FLUENTD_SYSTEMD=0.0.9-1 FLUENTD_VERSION=0.12.42 FLUENTD_VIAQ_DATA_MODEL=0.0.13 GEM_HOME=/opt/app-root/src HOME=/opt/app-root/src HOSTNAME=logging-fluentd-d9dld-debug JOURNAL_READ_FROM_HEAD= JOURNAL_SOURCE= K8S_HOST_URL=https://kubernetes.default.svc.cluster.local KUBERNETES_PORT=tcp://172.30.0.1:443 KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443 KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1 KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_53_TCP=tcp://172.30.0.1:53 KUBERNETES_PORT_53_TCP_ADDR=172.30.0.1 KUBERNETES_PORT_53_TCP_PORT=53 KUBERNETES_PORT_53_TCP_PROTO=tcp KUBERNETES_PORT_53_UDP=udp://172.30.0.1:53 KUBERNETES_PORT_53_UDP_ADDR=172.30.0.1 KUBERNETES_PORT_53_UDP_PORT=53 KUBERNETES_PORT_53_UDP_PROTO=udp KUBERNETES_SERVICE_HOST=172.30.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_DNS=53 KUBERNETES_SERVICE_PORT_DNS_TCP=53 KUBERNETES_SERVICE_PORT_HTTPS=443 LOGGING_ES_CLUSTER_PORT=tcp://172.30.222.181:9300 LOGGING_ES_CLUSTER_PORT_9300_TCP=tcp://172.30.222.181:9300 LOGGING_ES_CLUSTER_PORT_9300_TCP_ADDR=172.30.222.181 LOGGING_ES_CLUSTER_PORT_9300_TCP_PORT=9300 LOGGING_ES_CLUSTER_PORT_9300_TCP_PROTO=tcp LOGGING_ES_CLUSTER_SERVICE_HOST=172.30.222.181 LOGGING_ES_CLUSTER_SERVICE_PORT=9300 LOGGING_ES_PORT=tcp://172.30.74.55:9200 LOGGING_ES_PORT_9200_TCP=tcp://172.30.74.55:9200 LOGGING_ES_PORT_9200_TCP_ADDR=172.30.74.55 LOGGING_ES_PORT_9200_TCP_PORT=9200 LOGGING_ES_PORT_9200_TCP_PROTO=tcp LOGGING_ES_PROMETHEUS_PORT=tcp://172.30.170.82:443 LOGGING_ES_PROMETHEUS_PORT_443_TCP=tcp://172.30.170.82:443 LOGGING_ES_PROMETHEUS_PORT_443_TCP_ADDR=172.30.170.82 LOGGING_ES_PROMETHEUS_PORT_443_TCP_PORT=443 LOGGING_ES_PROMETHEUS_PORT_443_TCP_PROTO=tcp LOGGING_ES_PROMETHEUS_SERVICE_HOST=172.30.170.82 LOGGING_ES_PROMETHEUS_SERVICE_PORT=443 LOGGING_ES_PROMETHEUS_SERVICE_PORT_PROXY=443 LOGGING_ES_SERVICE_HOST=172.30.74.55 LOGGING_ES_SERVICE_PORT=9200 LOGGING_KIBANA_PORT=tcp://172.30.60.146:443 LOGGING_KIBANA_PORT_443_TCP=tcp://172.30.60.146:443 LOGGING_KIBANA_PORT_443_TCP_ADDR=172.30.60.146 LOGGING_KIBANA_PORT_443_TCP_PORT=443 LOGGING_KIBANA_PORT_443_TCP_PROTO=tcp LOGGING_KIBANA_SERVICE_HOST=172.30.60.146 LOGGING_KIBANA_SERVICE_PORT=443 OCP_OPERATIONS_PROJECTS=default openshift openshift- OPS_CA=/etc/fluent/keys/ops-ca OPS_CLIENT_CERT=/etc/fluent/keys/ops-cert OPS_CLIENT_KEY=/etc/fluent/keys/ops-key OPS_HOST=logging-es OPS_PORT=9200 PATH=/opt/app-root/src/bin:/opt/app-root/bin:/usr/libexec/fluentd/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/opt/app-root/src RUBY_VERSION=2.0 SHLVL=1 TERM=xterm _=/usr/bin/env container=oci sh-4.2# sh-4.2# sh-4.2# curl -kv $ES_HOST:9200 * About to connect() to logging-es port 9200 (#0) * Trying 172.30.74.55... * Connected to logging-es (172.30.74.55) port 9200 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: logging-es:9200 > Accept: */* > * Empty reply from server * Connection #0 to host logging-es left intact curl: (52) Empty reply from server sh-4.2# sh-4.2# sh-4.2# curl -kv logging-es:9200 * About to connect() to logging-es port 9200 (#0) * Trying 172.30.74.55... * Connected to logging-es (172.30.74.55) port 9200 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: logging-es:9200 > Accept: */* > * Empty reply from server * Connection #0 to host logging-es left intact curl: (52) Empty reply from server sh-4.2# *** This bug has been marked as a duplicate of bug 1560170 *** Re-opening. Customer has latest image and still seeing issue so I think this might be something new. registry.access.redhat.com/openshift3/logging-fluentd latest 3f822960402b 2 weeks ago 286 MB registry.access.redhat.com/openshift3/logging-fluentd v3.9.30 3f822960402b 2 weeks ago 286 MB Uploading fluentd log with trace level Please provide logging-dump.sh - this script will also test the inter-pod connectivity to rule out problems with e.g. ovs or other network virtualization. This was attached at attachment 1451554 [details] -- though the info you're looking for might be missing since the fluentd pods werent running. Anything you need us to run in an "oc debug" context?
# oc debug logging-fluentd-<podname>
^Allows us to spin up a pod without running the CMD so we can test things from the perspective of the fluentd pod
It isn't the connectivity to elasticsearch that is the issue - it is the connectivity to kubernetes: 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.3/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure' 2018-06-13 10:07:54 -0700 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter' ... 2018-06-13 10:07:54 -0700 [error]: /usr/bin/fluentd:23:in `<main>' This means the kubeclient code, not the elasticsearch rest code. Try this using oc debug logging-fluentd-<podname>: curl -v https://kubernetes.default.svc.cluster.local --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt |